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Heuristic Numerical Work in Some Problems of 
Hydrodynamics 


By John R. Pasta and S. Ulam 


It is well known that in many problems of hydrodynamics one cannot, at the pres- 

ent time, obtain valid solutions in a closed form. The asymptotic behavior of the 
solutions and even more generally their qualitative properties are barely obtainable 
through analytical work alone. This is even true of some problems which are al- 
ready highly schematized by neglecting various physical parameters, like the change 
in the equation of state in a fluid or gas during its motion. One proceeds in numerical 
computations with the continuum of the fluid replaced by a finite network of dis- 
crete points, and thus replaces the partial differential equations by a system of dif- 
ference equations. The time variable, too, is replaced by a discrete succession of 
steps in time. This is the usual procedure in solving initial value problems which do 
not yield to analytical methods. In recent years the advent of electronic computing 
machines introduced the possibility of large scale experimentation in calculation of 
problems in more than one dimension. 

It is the purpose of the following discussion to outline some general properties of 
such numerical work and to propose several different methods for numerical compu- 
tations. Some numerical work already performed will be discussed in the sequel; it 
dates from 1952, when the authors first applied such computations on electronic 
computing machines. More recently, F. Harlow [1] has applied similar methods to 
calculations performed on electronic machines in Los Alamos. With the constant 
improvement in electronic computers, both as regards their speed and the size of 
the memory, it will be possible to perform more ambitious calculations; both the 
variety and the magnitude of the problems which can be handled will increase. Such 
calculations can play a role analogous to that of experiments in physics and may 
suggest new theoretical lines of attack. 

There are, broadly speaking, at least two different ways to approach numerically 
the problems dealing with the dynamical behavior of continua. The kinetic theory 
of gases assumes the physical reality of the discontinuum. One calculates the proper- 
ties in the large of the motion of N “points’”—atoms or molecules which have sta- 
tistically given velocity distributions and are subject to incessant collisions with 
each other. There are forces acting between these points, e.g., deriving from poten- 
tials, but whose form is only imperfectly known. Many, but not all, properties of the 
macroscopic motions are largely independent of the exact form of the interactions 
between these particles. This is not the place to enter into a description of the way 
to derive the hydrodynamical equations from the Boltzmann integral-differential 
equations describing the microscopic behavior of such systems. Suffice it to say that, 
e.g., Navier-Stokes equations can be so derived and they will describe the behavior 
of certain statistical averages (or functionals on the 6N dimensional space) which are 
interpreted as macroscopic quantities like density, pressure and velocity, of a point 
in space (three dimensions) as functions of time. 


Received 12 May, 1958. This work was performed under the auspices of the U. S Atomic 
Energy Commission. 
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What we prefer to discuss here briefly is the numerical procedures which are sug- 
gested by this model. The number N is, in problems of hydrodynamics, enormous. 
N is greater than, say, 10” for 1cc of liquid. Is it possible to scale down this number 
to a value practical for numerical computations and still be able to observe mean- 
ingfully the behavior of the functionals in which we are interested? 

The answer, it seems to us, is in the negative at the present time. If one takes 
the Boltzmann equations literally and considers the individual points of calculation 
as representing atoms, then to obtain the “average” velocity of a point of the gas 
one would require, say, k particles per cell in a spatial resolution in which we are 
interested. Let the number of cells be /. (In practice, the resolution equals d on a 
linear scale and 1 equals d’; where r is the number of dimensions, and equals 1, 2, 
3 (in problems in one, two, or three dimensions without special symmetries reduc- 
ing the number of independent space parameters).) The total number of points is 
then 3 = kd’. If we want a statistical error of the order of 1%, then k ~ 10*. If the 
linear resolutien is to be of the order of 5%, say, then d is 20. Even in one dimen- 
sion, we would have to consider 200,000 points. This is much too high for present 
computers and should make it clear that in a numerical calculation, the “points” 
have to be thought of as representing not individual atoms but rather large aggre- 
gates of atoms. The behavior of each point has to be schematized so as to represent 
a statistical average of a great number of atoms. The numerical work will not reflect 
the Boltzmann equations, but the simpler equations which are its consequences; 
each point of our calculation represents, for example, the center of mass of a collec- 
tion of atoms. The implicit assumption in this set-up is, therefore, that such a mass 
remains coherent during the entire course of the problem. That is to say, the mole- 
cules initially close to each other remain so throughout the problem and do not 
diffuse too much with respect to each other—the small globule of the fluid does not 
distort too much. 

Parenthetically, we may add that this question is connected with the whole com- 
plex of difficulties which one encounters even in the purely mathematical studies 
in the theory of functions of several real variables. The so-called density theorems 
and the notion of set-derivation can serve as examples: The density of a set of points 
at a given point has to be defined through a sequence of sets such that not merely 
their diameters shrink to zero, but the sets have to be “not too thin”—for example, 
in case of rectangles enclosing the given point, the ratio of the sides has to remain 
bounded. This sort of assumption is necessary for the validity of Vitali’s theorem, 
etc. Physically it is clear that in our case we have to assume that the surface-to- 
volume ratio for the globules has to remain bounded for any a priori evaluation of 
the pressure, which acts on its center of mass. One has to introduce pressure, and 
there seem to be at least two obvious ways to do it: 

a) Assuming the knowledge of the equation of state, there is the dependence of 
the pressure on local density (we assume as given the thermodynamic nature of 
the process). In this outline we may specialize, say, to either an isothermal or adia- 
batic process, that is, p = f(p). The pressure gradients which are evaluable through 
gradients of density will be calculated then by estimating these gradients through 
the geometry of the instantaneous appearance of our system of points which repre- 
sent, we emphasize, the positions of centers of mass of globules. Our problem, then 
is that of finding a rule or recipe to estimate the density at a point of space given 
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only a finite system of points (rather widely separated!). The limiting case of an 
enormous number of points presents no difficulties, for one could simply count the 
number of points in a square of a fixed mesh, and this number will be proportional 
to the density. In practice, since we are limited to a moderate number of points 
for the whole fluid, the question is to estimate the density in the most reliable way. 
Consider the case of two dimensions. We can think of the points located initially 
in a regular array, for example, on the vertices of a rectangular division, or better, 
points of a triangular subdivision of space. Of course, after some “cycles” of the 
computation, the geometry of the system will change and one could proceed to 
estimate the density as follows: Enclose each point by triangles with the closest 
points as vertices. The smallest (in area) triangles in whose interior a given point 
is located would give, through the ratio of its area to the area of the original tri- 
angle, at least an idea of the change in density at that point. 

This procedure suffers from several drawbacks. There is the question of the com- 
putational stability of the calculation. The selection of the nearest points leads to 
discontinuity, in time, of the area. In a rectangular subdivision the more “classical” 
definition of density through the Jacobian in a finite approximation requires the 
knowledge of the position of four nearby points x + h, y + k. In the equations of 
motion for each of our points, we need the gradients of the density in the Lagrangian 
coordinates (the ‘‘a,b,c’’ not the “x,y,z’’). In our crude way of calculating the den- 
sities themselves, the computation of differences in fixed directions in space at a 
given time, the nearby points for increasing a,b,c may not be sufficiently close, and 
the gradients may be very inaccurately estimated. In the case of points in the 
boundary of the fluid, say with vacuum, one needs special prescriptions. All this 
introduces even more serious errors. 

b) There is another way to introduce, numerically, the forces due to pressure 
gradients. We could imagine repulsive forces acting between any pair of our points. 
They would in simple cases depend on the distance alone (the forces should derive 
from potentials if we assume scalar pressure—no tensor forces—no viscosity, etc. ). 
The form of the potential will, of course, depend on the equation of state. In a one- 
dimensional problem the nature of this correspondence is clear—it suffices to have 
forces between neighboring points only. The continuity of motion guarantees the 
permanence, in time, of the relation of neighborhood. The situation is however 
completely different in two or more space dimensions. (A general remark here: In 
one-dimensional problems the points of our calculation can best represent the 
boundaries of zones into which our fluid or gas is sub-divided. These keep their 
coherence and shape, and the only meaningful parameter for the spatial distribution, 
the density, is inversely proportional to the width of the zones. In two or more di- 
mensions the boundaries of zones are not easily describable by points or systems 
of a few points, and these points, we will repeat once more, correspond to centers 
of mass of volume.) We have to define the neighbors of a given point. One can do 
it in reference to the original geometry of a system of points that is to keep the 
relation which existed initially. But this is not good except in the case of infinitesimal 
or small deformation. The neighbors of a given point will change in the course of 
time. If one tries to calculate the resultant force on a point by calculating it from 
all the points in the mesh, not merely the neighbors, one should remember that the 
number of calculations increases with the square of the number of points. A “‘cut- 
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off” for the force is necessary at some distance and the force is not calculated if 
the distance between two points exceeds a certain constant. One has to remember 
that in the initial position, that is to say, in the regular lattice in which the points 
are arrayed, the cut-off has to be selected so that with it we obtain the initial dis- 
tribution of pressure. The pressure gradient is given directly as the resultant of all 
the forces acting on a point from its “neighbors” and depends on the actual posi- 
tion of the point without reference to the (“forgotten’”’ by the system) initial con- 
figuration. We will discuss in the sequel, for some concrete problems, how such cal- 
culating schemata operated. 

To summarize briefly the above computational scheme: The particles represent 
small parts of the fluid. The forces due to pressure gradients are introduced directly 
by imagining that neighboring or “‘close’’ points repel each other. The dependence 
of this force on the distance between points is so chosen that in the limiting case 
of very many points, it would represent correctly the equation of state. That this 
is possible, in principle, is clear a priori: the density is inversely proportional in the 
limit of a very large number of points to the square (in two dimensions) or cube 
(in three dimensions) of the average distance between them. The pressure is a 
function of density, and this being a function of the distances, we obtain an ana- 
logue of the equation of state by choosing a suitable distance dependence of the 
force. The Lagrangian particles are at Pi:(21 ,y: P2: (x2 Pw: 
(tw ,yw ,2w). The forces (repulsive) between any two are given by F;,; = 
F(d(P;,P;)) where d(P; ,P;) is the distance between the two points. 

The average value of d at a point of the fluid is, in the limit of N — ©, a func- 
tion of the local density: d ~ p*”. 

The pressure p is a function of p alone in, say, isothermal or adiabatic problems. 
The pressure gradients give then the expression for our F. 

There is so far no general theory and the convergence of such finite approxima- 
tions to the hydrodynamical equations remains to be proved, but even more im- 
portant than that would be an estimate of the speed of convergence. 

One could assume, as a starting point for a numerical calculation, instead of the 
partial differential equations of hydrodynamics, e.g., the equations of Euler-La- 
grange, a mathematical description which is somewhat more general: the “points” 
of our calculation need not correspond to the material points of a fluid, but instead 
may represent—more generally—some other parameters of the problem. 

After all, in many problems one is not interested in the positions of every given 
particle of the fluid, but rather in the behavior, in time, of a few functionals of the 


motion; for example, if a,b,c are the “laboratory” or Lagrangian coordinates in 


fixed space in the classical formulation, one deals with the functions 2x(a,b,c,t), 
y(a,b,c,t), z(a,b,c,t) which are interpreted as a position at time ¢ of the point which 
at time ¢ = 0 occupies the point (a,b,c). p(2,y,z,t) is the local density which is 
computed by differentiation from the knowledge of x,y,z and the derivatives 
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The pressure p is, for our purpose, computable from p. It is always possible to 
think of the functions z,y,z (which satisfy the Euler-Lagrange equations) as devel- 
oped into series: 
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y = ijk = 1,2, --- 


Here the y’s are fixed functions of a,b,c alone. The functions need not form orthog- 
onal systems but, to fix the ideas, we might think of them as being terms of the 
Fourier series of Rademacher functions, for instance. The a,6,y are functions of 
time alone and can be treated as abstract Lagrangian particles. The partial differ- 
ential equations for x,y,z are replaced by a system of infinitely many (a discrete 
infinity) of ordinary differential equations describing the change in time of the 
a,8,y. To see the validity of such an approach, we shall illustrate this proposal in 
an example. 

In a one-dimensional problem, we have to find the function x(a,t) satisfying the 
equation 


da 

with a given initial distribution of density and velocity of the fluid. The ordinary 
numerical procedure for a solution consists in the replacement of the continuous 
variable x by a discrete one, that is, x(a) isreplaced by xz; i = 1, 2, --- N. Each 
x; obeys a Newtonian equation of the second order for the x;(t) where in practice 
we also replace t by a discrete sequence of times and obtain a system of difference 
equations. Our introduction of the fixed functions of space ¥(2) and the a;’s amounts 
to a Lagrangian change of variables where instead of the x; which are the actual 
“points” of our fluid, we introduce new variables (q;) in Lagrangian notation: 


where the functions are “holonomic’’, that is, they do not involve the velocities 2; 
in a non-integrable form. For example, the analogue of the Rademacher functions 
would be: If n is of the form n = 2" 


n/2 n 
i=1 i=(n/2)41 
n/4 n/2 (3/4) n n 
i=1 i=(n/4)41 i=(n/2)4+1 i=(3/4)n4+1 


The differential equations for the x; will be replaced by a system of equations for 
the g; ; the forces which are given directly for the x variable by pressure gradients 
will be replaced by “generalized” forces which are functions of the g; and their 
derivatives. In cases where the ¥ form an orthogonal system, the kinetic energy 
will still be a quadratic function of g; and the a;. This procedure is, of course, 
strictly legitimate in the case where we consider the x variables being expressible 
by the q; . 

A general question arises: Which, in a given problem, are the most convenient 
variables (q)? It is clear that in many problems of hydrodynamics one is interested 
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mostly in certain over-all properties of the motion of the fluid—one wants to 
know the behavior of certain given functionals. 

In the classical approach one calculates first the x,y,z for each of the points of 
the space a,b,c. In practice, one is limited to a finite number of these points, and 
it is perhaps plausible that in many cases it would be “better” to know, instead, 
an equal number of coefficients a;;, in the development of x(a,b,c,t) in a given 
(Fourier) series than to know the value of x in a corresponding number of points 
in the space of the a;,’s. This is especially clear if one should know, a priori, from 
the physical nature of the problem that the functions x,y,z are reasonably smooth 
(e.g., in the sense that the absolute value of the partial derivatives remains 
bounded). In the terminology of the a;;,’s, this smoothness amounts to the knowl- 
edge that the coupling between the a’s which have high indices and those with 
small values is small. That is to say, the series converge rapidly. Physically speak- 
ing, it means that the high frequencies are less excited. 

Such reduction of a continuum to a discrete countable infinity is, of course, very 
familiar in some problems of quantum theory—the radiation field, etc. The prob- 
lems that have been dealt with in quantum theory have been mostly linear; there 
are no forces between the coefficients and each mode or “particle” represented by 
the coefficients behaves independently of all the others. 

For the case of our hydrodynamical problems, the forces due to pressure gradient 
are functions of all the coefficients; that is, we have a true n-body problem, and 
the ‘‘quantization” is justified practically only if a cut-off at a finite index (7) is 
permissible together with an estimate, in advance, of the error for all ¢ under con- 
sideration—that is to say, if the high modes do not become increasingly important. 
When the high modes do acquire more energy as time goes on, the classical approach 
becomes difficult also. The onset of turbulence or of the positional mixing of the 
fluid renders the classical treatment (partial differential equations) illusory. The 
absolute values of spatial derivatives increase and even their existence for finite 
times cannot be guaranteed and a statistical approach is indicated. It would seem 
that in problems where this behavior is expected, the approach through a study of 
a finite model involving perhaps a change of coordinates, like the one proposed, 
may be of possible utility. If one wants to study the rate of development of insta- 
bilities or the rate at which mixing proceeds, then the flow of energy from the low 
modes (that is to say, from the a’s with small indices to those with high indices) 
will show just the rate of increase in mixing positionally or, in the derivatives, the 
rate of increase in vorticity. 

We would like to mention another possible advantage in the use of general coor- 
dinates in numerical work. A description of the motion of the fluid through the 
partial differential equations of hydrodynamics postulates the existence of the par- 
tial differential expressions. It is well known that even in comparatively simple 
problems these derivatives exist only up to a finite value of time, after which dis- 
continuities develop, e.g., in pressure and in density (that is to say, shocks), and 
one has to use different methods to treat such discontinuities. On the other hand, 
a Rademacher series or Fourier series may very well represent a step function, 
more generally functions without derivatives at some points. The a; might then 
continue to be used as dynamical variables even after discontinuities occur in de- 
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t=O 


Fic. 1.—Initial configuration, t = 0 


rivatives like 0x/da making the differential expressions in the Euler-Lagrange equa- 
tions unmanageable. 

In order to test some of these general speculations on actual problems, we have 
run some numerical computations on an electronic machine, the MANIAC, in the 
Los Alamos Scientific Laboratory and on its prototype at the Institute for Advanced 
Study in Princeton. The main purpose of these calculations was exploratory, and 
the feasibility of using certain numerical schemes on the machine was considered 
of more interest than the precision of the results. The main point of interest was 
the amount of time necessary in order to compute on the machine the time behavior 
of certain functionals of our systems. The problems were mostly of the initial value 
type, the integration was in time, and the calculation ran in cycles, for which we 
decided that about five minutes would be allowed. The nature of the problems and 
the eharacteristics of the machine with this requirement fixed the maximum num- 
ber of mass points at about 256. The first problem studied involved the motion of 
a heavy fluid on top of a lighter one—usually known as the Taylor instability con- 
figuration. 

The initial configuration was the 16 X 16 array shown in Fig. 1. The particles 
are of two types, one having a mass which was double the mass of the other. The 
force was the same between any two particles and was chosen to be inversely pro- 
portional to the distance of separation. This algebraically simple choice was made 
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mainly to keep the computation time to a minimum. The whole system of particles 
is enclosed in a unit square with the heavier particles on top and all particles sub- 
ject to an external gravitational field, an unstable configuration. With our simple 
force law the computation for each pair of particles took about 30 milliseconds. 
With our 256 particles, or over 32,000 pairs, the computation of resultant forces 
would have taken 15 minutes. This time can be reduced by introducing a cut-off 
in the force so that for a particle lying outside such range of influence, the only 
computation necessary is that of a separation between the particles. In order to 
avoid even such computations of distance which involve the long multiplication 
time of the machine, we arranged the cut-off in terms not of the Euclidean metric, 
but in terms of another one (Minkowski): the sum of the horizontal and vertical 
distances of separation thereby reducing the time of ascertaining the distance in 
such case to about 3 milliseconds for these pairs. With this, the total computation 
time (for a cycle) was about 5 minutes, and a typical problem would run 150 cycles 
—all together, with printing of results, somewhat more than 10 hours. 

When a particle approached the side of the container closer than a cut-off distance, 
a special situation arose. The most convenient way of treating the wall was to 
create a virtual particle located on the wall at the same horizontal or vertical posi- 
tion as required to contain the real particle. Again we should emphasize that for 
our first experiments, this recipe was chosen for computational conveniences rather 
than as a mathematical or physical requirement. Thus, also, all the parameters for 
the problem were chosen to be simple powers of 2 so as to be able to use the fast 
operation of the left and right binary shift, rather than the slower operation of 
direct multiplication. 

With our grid of points, crudely as it was chosen, it hardly can be expected that 
the details of the motion will be exactly described. A typical configuration is shown 
in Fig. 2; a much later cycle (later time) configuration is shown in Fig. 3 and the 
formation of the ‘‘atmosphere”’ is apparent. What can be hoped, however, is that 
some functionals of the motion will be more accurately depicted. We are interested 
in the transition phase and in the time rates of mixing on a large scale between 
the two fluids. One of the functionals which we observed and plotted as a function 
of time is the total kinetic energy of the particles divided into two parts: the kinetic 
energy of the horizontal and vertical motions separately (these are shown on Fig. 
4). Even though the motion itself is very irregular, these quantities seem to present 
rather smooth functions of time. In the case of a stable configuration with the lighter 
fluid on top and the heavy fluid on the bottom, there would be only a periodic 
interchange of kinetic and potential energies.* In the unstable case, there should 
be an increase of kinetic energy which persists for a considerable time. The hope is 
that with many more calculations one could try to guess from such numerical re- 
sults at least the form of an empirical law for the increase of this quantity as a 
function of initial parameters. 

Another quantity of interest is the spectrum of angular momentum which may 
be defined, for example, in the following way: we draw around each particle a circle 
of fixed radius and calculate the angular momentum in each such region. We then 


* See Fig. 5, showing the configuration at a time later than the one in Fig. 2. 
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KINETIC ENERGY 


Fig. 4.—Kinetic energies of horizontal and vertical motions 


STABLE CASE 


Fia. 5.—Typical configuration, stable case 
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take the sum of the absolute values or the sum of the squares of these quantities. 
This gives us an over-all measure of angular momentum on a scale given by the 
radius. Varying the radius, these numbers may then be studied as a function of 
both time and the radius. It is interesting to consider the rate at which the large 
scale angular momentum is transferred to small scale eddies—also, the spectrum 
in the asymptotic state, if it exists. All this, of course, as a function of the param- 
eters of the problem, the ratio of the densities of the two fluids and the external 
force. It is clear that very many numerical experiments would have to be performed 
before one would trust such pragmatic “laws” for the time behavior of mixing or 
increase in vorticity. 

Another reason for selection of our problem was an interest in the degree of spa- 
tial mixing of two fluids, starting from an unstable equilibrium. To measure quan- 
titatively such mixing in configuration space, a functional was adopted similar to 
the one just mentioned for the spectrum of angular momentum. At a given time ¢, 
a circle is drawn around each point P. In each circle, we look at the ratio r of the 
number of heavy particles to the total number of particles in this region. We take 
the quantity u(P,t) = 4r(1 — r). This gives an index of mixing of the fluid in the 
circle, being equal to zero if only one fluid is present and equal to one if particles 
of both fluids are equally present in it. We average this quantity over all the cir- 
cles. Initially this average is very close to zero, the only region where it differs from 
zero being around the interface. As the mixing proceeds, our average measure of 
mixing increases with time. We have plotted this quantity for several different radii. 
These functions are again comparatively smooth even though the interface be- 
tween the two fluids becomes highly convoluted. Again, one might expect that 
after a sufficient number of numerical experiments, one would obtain a hint of 
the time behavior of this “mixing functional’’, or at least an idea of the time T 
taken for the over-all mixing (which is initially close to zero) to become of the 
order of 3 or 1/e, say. This time, for dimensional reasons, must depend on +/L/a 
where L is the depth of the vessel and a the acceleration of the lighter fluid into 
the heavier one. 

The problem of Taylor’s instability in its initial stages was also examined in Los 
Alamos experimentally for the case of a heavy gas on top of a light gas. The inter- 
face between the two gases involved an irregularity. Photographs taken at various 
times show configuration of the two gases not unlike those developing in our cal- 
culation. If one wanted to put credence in calculations like the above, the form of 
the proper force law imitating the real equation of state would have to be chosen 
very carefully. Also one would have to study the question of how the accumulation 
of error due to the finiteness of our grid will effect the behavior of our functionals. 
We have treated compressible gases. An attempt to treat numerically incompres- 
sible fluids in a similar way encounters serious additional difficulties: 

In order to preserve the volume of each fluid element, one would have to keep 
constant the areas, in two-dimensional problems, of triangles or other elementary 
figures whose vertices are occupied by our particles. This means that there is a 
great number of constraints added to the equations of motion for the repelling mass 
points. This was computationally not feasible on the available machines for prob- 
lems where the number of points was of the order of 100 or more. 
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Another way of calculating in the above spirit would have been to postulate addi- 
tional forces, doing no work but strongly tending to preserve the elementary area 
defined by our mass points. This also turned out to be impractical. 

Atomic Energy Commission, Washington, D. C. and 
Los Alamos Scientific Laboratory, Los Alamos, New Mexico 


1. M. W. Evans & F. H. Hartow, The Particle-in-Cell Method for Hydrodynamic Calculations, 
Los Alamos Report 2139, 1957. 
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On the Simultaneous Determination of Several 
Eigensolutions of a Self-Adjoint System of 
Differential Equations 


By P. Laasonen 


1, Abstract. Any finite number m of eigensolutions of a definitely self-adjoint 
system of ordinary differential equations may be approximated simultaneously to 
any desired accuracy by an iterative procedure and the solution of an (m X m)- 
matrix eigenvalue equation. If the method is used with rounded numerical values, 
intermediate purification steps are found to be necessary at times; a rule for the 
estimation of roundoff errors is established. A simple example illustrates the theo- 
retical argument. 


2. Definitely self-adjoint problems. Let the nth order system of ordinary differ- 
ential equations have the form 


= (F(x) + rG(x))u(z), 


where wu is an n-vector, F and G(n X n)-matrices of continuous functions on the 
interval (a, b), and \ the eigenvalue parameter. The boundary conditions considered 
have the form 


Au(a) + Bu(b) = 0, 


where (A, B) is a constant (nm X 2n)-matrix of rank n. The eigenvalue problem 
thus formulated is called definitely self-adjoint by Bliss [1], [2], if the following 
conditions are fulfilled: 

le. There exists a non-singular matrix T(x) such that 


RT =0 
dx 


(1) TG +G'T =0 
AT™(a)A’ = BT™(b)B’ 

2°. The matrix S(x) = T’(x)G(zx) is symmetric and positive definite or semi- 
definite. 

3°. If \ = 0 is an eigenvalue and w(x) a corresponding eigenvector, then 


uo’ (x) S(x)uo(x) 0. 
We will replace the last condition by the stronger one: 
3°. X = 0 is not an eigenvalue. 


Definitely self-adjoint problems thus defined have at most countably many eigen- 
values \,;(7 = 1, 2, ---), all real. In the following we assume | \; | S | A; |, if 7 < 7. 


Received July 9, 1958. The preparation of this paper was sponsored by the Office of Naval 
Research. 
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The corresponding eigensolutions, denoted by u;(x), may be chosen to satisfy the 
orthonormalization condition 


[ ui dx = 


3. An iteration procedure. Let Vo(x) be an (n X m)-matrix whose columns are 
continuous and linearly independent on (a, 6). Use Vo(x) as the initial matrix for 
the sequence V;(x) determined successively as solutions of problems 

Vela) = + 
(a 
AV,(a) + BV.(b) = 0, 


and define matrices Qs , for 0 = a = 8B, by 


= 1,2,---) 


b 
[ dz (8 = 0,1,2,-+*). 


These generalizations of Schwarz’s constants are symmetric and independent of 
the choice of the index a. In this analogy the eigenvalues of the matrix equations 


(2) (Qea-1 — KQea)t = O 


correspond to the Schwarz’s quotient. Denote the diagonal matrix of its eigenvalues 
by K, and a matrix of corresponding eigenvectors by 7’, ; hence 


a G7 = 0. 


4. Theorem on convergence and its rate. Assume first, the following conditions: 
(a) Vo(x) is arbitrary in the sense that no linear combination of its columns is 
orthogonal to the first m eigensolutions u;(z) (i = 1, 2, ---, m). (b) The first m 
eigenvalues are smaller in absolute value than all others, i.e. | Am | < | Am4i |. Then 
the sequence of the solutions of (2) provides successive approximations tending to 
the first m eigensolutions of the original problem: 

If the eigenvalues of Ka are ordered and the vectors T,, normed properly, then the 


former ones converge to \; (i = 1, 2, «++, m) and the column vectors of 
(3) V.(r)T 
to u; (i = 1, 2, ---, m) asa— &~. The rate of convergence is such that the ith eigen- 


value as well as all components of the corresponding vector differ from their respective 
limits by amounts 0( | |"). 

The convergence theorem has been proved under more general conditions in the 
author’s papers [3] and [4]. There it was found for instance that if the condition 
about the generality of Vo(2), expressed in (a), is not satisfied but some among the 
m first eigensolutions of the original problem are orthogonal to linear combinations 
of columns of V(x), then convergence still prevails, the limiting eigenvalues and 
eigensolutions of the process are just replaced by some others, associated with the 
next smallest eigenvalues. Again, if condition (b) above is not fulfilled, then the 
convergence still occurs with respect to eigenvalues with smaller absolute value 
than | A,,4: | and to the corresponding eigensolutions. 
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The rate of convergence is not discussed in the previous papers; the error esti- 
mate is, however, obtainable from the proof in the latter paper by quite straight- 
forward considerations. 


5. Rounding off errors. In order to have an estimate of the effect of rounding off 
errors of the Q’s, unavoidable in numerical computation with truncated values, dif- 
ferentiate the eigenvalue equation (2), 


(dQea—1 KdQoa)t + (Qea—1 KQoa) dt = 0. 
Multiplication from left by ?’ and division by 


(4) t= Kt'Qoa t 
gives, observing the transposed form of (2), the final formula 
(5) dk _ UdQeait  tdQrat 


Let A and U(x) be the diagonal matrix of eigenvalues \; and the matrix of cor- 
responding eigensolutions u;(x), respectively. Moreover, let C, defined by 


b 
C= U'(x)S(x)Vo(x) dz, 


be the matrix of coefficients in the expansion of Vo(z). Then any V.(x) (a = 1, 
2, --:) has an absolutely and uniformly convergent expansion 


V.(z) = U(z)A “°C. 


The condition (a) above is equivalent with the statement that the top square 
submatrix C; of C, formed by its m first rows, is non-singular. Partition C and A 


as follows, 
C= A= 
C2 0 Ae 


where C; and A; are (m X m)-matrices and use the abbreviation 


t 

Pp = = 

As “Cot 

for the coefficient vector in the expansion of an approximation 


Veat = UA **Ct = Up. 


In the proof, previously mentioned, it has been shown that when a increases indefi- 
nitely the coefficient vector p, provided the eigenvector ¢ of (2) is chosen and 
normed properly, tends to a limit vector all of whose elements vanish with just 
one exception; this element, the ith, is among the first m and may be assumed to 
be 1. Moreover, the convergence is such that the deviations of the elements of p 
from their respective limits are of the magnitude 0( | \;/Am4. |“). Of course, the 
finite vector p: has the same properties. Hence, if ¢,,"" is the notation of the m-vec- 
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tor whose ith element is 1, all other elements 0, then the vector p: at the ath stage 
has the expression 


Di = + | [Ge 


where a, is an m-vector bounded for a— ©. 
Express now ¢ in terms of p; : 


Cr Arp = Cr Pen” + | 


where b, is also an m-vector bounded for a — . In order to estimate the right 
hand side terms in (5), i.e. quotients of the form 


t'dQs t 
(7) 


consider first their denominators. Qs has an expression 
Qe = C’A °C = Cy'Ar °C, + C2’ °C? 
and therefore, for the particular value 8 = 2a, 
UQeat = pr’Ar “Cy Ay "Cy + C2! Ae **C2) Cy Ar “pi 
X + | Ai/Ama 


Since the second term in the second bracket decreases at least like | Xm/Amaa |“, 
the total product obviously has the asymptotic expression \,’*. By (4) one finds 
that the asymptotic expression for t/Qe.—1t is 47°". 

Using (6), the numerator of (7), for 8 = 2a, is clearly approximated by \,** 
multiplied by the ith diagonal term of C;/"dQe.C , as a increases. If C; is arbi- 
trarily chosen, that is, if Vo(«) is originally arbitrary, then the order of magnitude 
of this diagonal term is determined by the product of \; °*, which determines the 
order of magnitude of all elements in Q2., and a proper measure of the relative 
rounding off error in the elements of Qe. , say 6. Hence, for an arbitrarily chosen 
Vo the quotient (7) bas, for 8 = 2a, the order of magnitude 


The same estimate is, of course, also valid for 8 = 2a — 1. For an increasing a 
the effect of the rounding off error in the approximations k; for \; with an absolute 
value less than | \; | will accordingly increase indefinitely even if the relative mag- 
nitude of the rounding off error should remain bounded. 

On the other hand, if C: is almost diagonal, then the ith diagonal term of 
C1’ "dQeaCy* has the order of magnitude \;°*6 and the quotient (7) the order of 
magnitude 


(6) 


= 6. 
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In this case Vo consists of columns already almost equal to the first m eigensolu- 
tions and the method gives all eigensolutions with an approximately equal accuracy. 

If the condition (b) is not fulfilled, i.e. some of the first m eigenvalues are equal 
to some later eigenvalues in absolute value, then the above statements are still 
valid regarding those eigenvalues whose absolute values are smaller than | A»41 |. 


6. Practical use of the method. After an initial matrix V(x), i.e. any set of m 


linearly independent trial vectors vp‘ (x), has been assigned, one has to solve m 
independent boundary value problems 


(8) (2) = + (i = 1,2,--- ,m). 


(An (a) + Bu(b) = 0; 


The solution vectors »;‘” (x), i = 1, 2, ---, m, form the matrix V(x). Usually it is 
already advisable after this first step to improve the initial matrix for the next 
step. To this end the matrices Q: and Q2 are formed from Vo and V,, the corre- 
sponding eigenvalue equation (2) solved and V;(x) replaced by the product V;(x)T; 
from (3). It is true that the procedure gives theoretically completely equal results 
whether continued with V;(xz) or Vi(z)7T. as the next initial matrix. However, 
there may be a substantial difference in the practical results, due to the fact that 
the top square submatrix C; of the coefficient matrix C related to the latter is usually 
much closer to a diagonal matrix and the effect of rounding off errors accordingly 
decreased. 

This effect may be estimated at any stage and for any eigenvalue by evaluating 
quotients (7) on the right hand side of (5). The denominator hereby may be com- 
puted directly after the eigenvector ¢ has been determined; the order of magnitude 
of the numerator may be obtained by replacing dQ, for instance, by the positive 
diagonal matrix whose diagonal elements are the estimated positive rounding off 
errors of diagonal elements of Q. 

If the fractions (7) are small, then, of course, several iteration steps like (8) 
may be performed without needing to interrupt the process for an intermediary 
purification step. 

Finally it may be observed that if one wants to determine the first m eigensolu- 
tions, then it might be useful in some cases to carry out the computations with a 
larger number m’ > m and the excessive least accurate m’ — m solutions related 
to the largest eigenvalues may be omitted at the end. 


7. Comparison with the customary method. If more than just the lowest eigen- 
value is needed, then the customary iterative method computes the wanted eigen- 
values successively in the order of increasing absolute value. Hence, the steps con- 
sisting of integrations of boundary value problems of type (8) are equal in the cus- 
tomary as well as in the described method. The orthogonalization steps which are 
necessary at times in the former method in connection with higher eigenvalues have 
their counterpart in the purification steps of the latter method. These intermediary 
steps, involving solution of auxiliary eigenvalue matrix equations, generate addi- 
tional computation. The associated extra work is, however, mostly outweighted by 
following advantages. 
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If the approximate eigenvalues are interpreted as minimal values of functionals 
of Rayleigh-Ritz type, then the present method bases its trial functionals at each 
step on linear combinations of m vectors instead of one vector. Accordingly all 
results are obtained, after some number of iterations, with substantially greater 
accuracy than by the method based on the use of one vector for \; , two vectors 
for dz , ete. 

Furthermore, separation of two or more eigensolutions associated with eigen- 
values of almost equal absolute values is established automatically, whereas this 
case is always quite troublesome by methods which proceed with one vector at a 
time. 


8. An example. To illustrate the described method consider the simplest possible 
problem: 


r-( o-(% 9) 4-69) 3). 


The solution T of (1) and the corresponding S are easily found to be 


s=( 9): 


Moreover, the conditions 2° and 3” are fulfilled. Hence the problem is definitely 
self-adjoint. 
In order to find the first two eigensolutions take for instance a = 0, b = 1 and 


choose 


Solving (8) for the two corresponding column vectors results in 
(6x — 32° 32 — 2° 
6\6-—6e 3-3 
and the matrices 


g, - 1. (4% % = (872 427 
720\25 5040 \427 272/° 


The eigenvalue equation (2) with a = 1 gives now the eigenvalues, 
_ 846 F 964/51 _ (2.4680 
~ 23.563’ 
and the eigenvectors 


2 2 
4- 
A second iteration step, based on the immediate use of V;, gives the matrix 


y, = 2 (4% — 202° + 52° 25x — 102’ + 2° 
720 \40 — 6027 + 202° 25 — 3027 + 52‘) 


TI 
he 
| 
T= 0 of 
: 
TI 
ta 
| 
| 1 
as 
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The corresponding matrices Q; and Q, are 


= 1 (19584 12465\_ = 1 872960 555731 
* 362880 \12465 7936)’ * 39916800 \555731 353792 


and the solutions of (2) with a = 2: 


_ 10496290 6400+/1725010 _ {2.46740208 


qT, < 2062 2062 
3328 + 5+/1725010 3328 — 5+/1725010 


The relative errors of these approximations of \; and d: are 4 - 10’ and 4 - 10“, 
respectively. This accuracy is, however, possible only by using untruncated values 
of the elements of Q; and Q,. This may be seen by computing the ratio (7) for 
instance with Q, in the denominator and 10° - I as the kernel dQ in the numerator. 
Inserting for ¢ the first or the second column of T2 one obtains 7 - 10~’ or 5 - 10°’, 
respectively. Since the elements of Q, are of the order of magnitude 10~’, this result 
shows that truncation of the elements of Q, by 10~° causes at the first eigenvalue 
an error of the same magnitude but at the second eigenvalue an error 10™*. In fact, 
this result can also be obtained by a direct computation. If the elements of Q; and 
Q, are truncated to about 6-place values, 


0.0539683 0.0343502 0.02186949 0.01392223 
* \0.0343502 0.0218695/ * \0.01392223 0.00886324 


then the following eigenvalues are found for (2): 


=> 


21.669 


The error caused by this truncation into the first and second eigenvalue is hence 
at the eighth and at the second place, respectively. 

If on the other hand the second iteration step is made by an improved matrix, 
taking 


V;* = Vi T; 
_1 — — + +/51(32 — 2°) — Gx” — 42° — V/51(3x — 2°) 
6 \24 — 122 — 122° + 3+/51(1 — +24 — 122 — 122° — — 2”) 


as initial matrix, then the corresponding new vector matrix is 


180x — 80x° + 10x* + 42° + +/51(25x — 10z* +2°) 
1 180x — 80x° + 10x* + 42° — +/51(25x — 102° + 2°) 

~ 120/180 — 2402” + 402° + 202 + 5~/51(5 — 6x" + 
180 — 240x + 402” + 20x* — 59/51(5 — 62° + x‘) 


V.* 
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Related kernel matrices are 


m2 202372 + 28337+/51 4 

> ~ 90720 4 202372 — 28337+/51 
3007300 + 421105+/51 68 

* “~~ 3326400 68 3007300 — 421105+/51 

and the corresponding solution of (2): 
_ 10496 290 6400+/1725010 _ {2.46740208 
847269 ~ 22.309 
1 —0.0000115 
am 1 ) 


In this case the truncation of the numerical values of the elements of the Q’s 
does not produce essentially greater error in the second eigenvalue. This may be 
seen by computing the characteristic quotients (7): t’Q,*t, by inserting the second 
column of 72*; this gives ~3 - 10° and, if dQ,* is taken to be the diagonal matrix 
containing diagonal elements of Q,* multiplied by 10°, then ¢/dQ,*t is ~3 - 10°”, 
hence the order of magnitude of the relative error of k is 10°°. And actually, if the 
elements of Q;* and Q,* are truncated to about 6-place values, 


4.46140 4.40917 - 10° 1.808138 2.044252 -10° 
* \4.40917-10* 5.88913-10 2.044252-10* 2.639971- 10 
then the following eigenvalues for (2) are found to be: 


2.4673997. 
'\22.3093475 


The relative errors due to the truncation of the elements of the matrices are hence 
10° and 3 - respectively. 
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On the Enumeration of Majority Games 


By John R. Isbell 


1. Introduction. This paper presents a combinatorial method for the enumeration 
of all strong weighted majority games, and an arithmetical method for the enumera- 
tion of the games having “homogeneous” weights in the sense of von Neumann and 
Morgenstern [1]. In contrast with the combinatorial method of von Neumann and 
Morgenstern [1], both of these methods (a) do not generate extra isomorphic copies 
of the same game, but (b) do generate some unwanted objects (so that tests are re- 
quired), and (c) are recursive on the number of players. The known (von Neumann 
and Morgenstern [1], Gurk and Isbell [2]) complete list of 30 strong simple games on 
n & 6 players (21 of which are majority games) is supplemented by a complete list 
of the 114 strong majority games with 7 non-dummy players. 

A strong simple game (for the rest of this paper, a game) on a finite set N of 
players is (1) intuitively, a scheme for distributing power to coalitions of players, 
all-or-none (this is the sense of “simple” ), and so that if the players are partitioned 
into two parties, one must win (this is the sense of “‘strong’’). (2) Precisely, it is 
a set of subsets of N called winning sets, such that (a) any set containing a winning 
set is winning, and (b) for any S C N, either S or N-S is winning but not both. 
(3) In this paper, we shall identify two games if there is a one-to-one correspondence 
between their players identifying their families of winning sets; to name a more 
definite object one must say, not “the game G”, but “the game G with ordered 
players P,, ---, P,”, or a similar phrase. Beyond this, note that an n-player game 
(e.g. the 435-player game of the House of Representatives) may be converted into 
an (n + k)-player game by adjoining k “voteless”’ players or dummies. The phrase 
“an n-player game G” does not here imply that all n players are non-dummies. (In 
the first part of the argument we need games with dummies; afterward we shall 
exclude them. ) 

A majority game is a game for which one can assign numerical weights w , - - -, 
w, to the players so that the winning sets are precisely those sets which have more 
than half the total weight. Some of the w; may be negative or zero; it is easy to see 
that the corresponding players must be dummies. (Since every superset of a win- 
ning set wins, a player can have negative weight w; only if | w; | is so small that it 
makes no difference.) Given a game G, the question whether G is a majority game 
is effectively decidable, since it turns on a finite system of linear inequalities. No 
better method than the obvious ones is known. It is clear that every majority game 
has non-negative integral weights—even positive integral weights, for any weights 
may be assigned to the dummies provided the weights of the other players are 
made large enough. 

The method described below for enumerating all majority games depends on 
the determination of the game with ordered players with weights (wo , wi, ---, Wa) 
by the systems (wo + wi, We, ---, Wn) and (wo — Ww, We, ---, Wa). The choice 

Received 27 June 1957. Supported in part by the Office of Naval Research under Contract 
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tion fellowship. Presented to the Econometric Society, August 1956. 
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of weights and orderings gives no trouble. However, the method does not generate 
weights, but only a combinatorial description of the game. There is an obvious 
formula for weights for the large game in terms of “suitable’”’ weights for the small 
games, but we have no version of the method which does not involve both the solu- 
tion of systems of linear inequalities and the examination of systems which turn 
out to be inconsistent. There is, however, a fairly simple combinatorial criterion 
which screens out all the inconsistent systems through n = 7. 

Weights (w:, ---, wn) for a game are called efficient if every winning set S con- 
tains a subset which wins by exactly one vote. (This is equivalent to saying that 
every minimal winning set wins by one vote. Von Neumann and Morgenstern [1] 
call these “homogeneous” weights, but this term seems likely to lead to confusion. ) 
It is known [3] that efficient weights w; for a game with ordered players are unique; 
they are non-negative integers; for any other non-negative integral weights v; be- 
longing to the same game with ordered players, w; < v; for all 7; and when the non- 
zero weights w; are arranged in non-decreasing order, they grow no faster than the 
Fibonacci numbers. We call a non-decreasing sequence of positive integers (w , 
+++, w,) an efficient sequence if it is an initial segment of some non-decreasing effi- 
cient game weights (w: , ---, we, -**, Wa). We find a characterization of these and 
a recursive arithmetical method for generating them, which generates nothing but 
efficient sequences. It is necessary, of course, to select those which are actually 
game weights; for n = 7 there are 23 games in the 99 efficient sequences. The test 
for this is simple, and the whole computation is far easier than the other one de- 
scribed above. 

It is known [3] that for n = 4 there are at least 2”~ efficient weighted majority 
games with n non-dummy players. We obtain the upper bound (n — 1) !—actually 
an upper bound for the number of efficient sequences- There is even more room 
between the known bounds for general majority games with n non-dummy players; 
we find there are more than 2” for n = 8, but no upper bound is known short of 
2” (the number of sets of sets of players). 

A little experimentation with the methods presented here will quickly show the 
desirability of an arithmetical method for enumerating all majority games. This 
would presumably involve a unique choice of weights for each game. For n S 7 
each n-player majority game has minimum non-negative integral weights w; . (That 
is, for any non-negative integral v; defining the same game and order, w; S »v; for 
each 7.) Such weights are clearly unique. However, they do not always exist, as we 
show with a certain 12-player game. 

To put the classes of games in perspective, consider the author’s manuscript list 
of 559 non-majority 7-player games without dummies, thought to be free from 
duplications. It was computed by the von Neumann-Morgenstern method some 
years ago and 109 of the 114 majority games were found. 


2. The operators H; and H,. We wish to formalize the construction described 
by the symbol (wo — wi, We, «++, Wn). A motion of a game is a permutation of its 
players taking winning sets to winning sets. Under the group of motions of the game 
the players sweep through transitivity sets which we call roles. A court is a majority 
game with a distinguished role. A natural ordering of the players of a majority game 
is an ordering (Pi, ---, P,») such that the game has weights w; such that wi = 
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W, 2 --- 2 w,. However, a natural ordering of the players of a court is an ordering 


(Pi, ---, P,) such that P; is an element of the distinguished role and for some 
weights w;, = --- Wa. 
Lemma 1. If (w;, ---, wn) and (v), ---, vn) are two systems of weights for a ma- 


jority game G with ordered players P,, ---, Pn, and w, => --- = wa, andaisa 
permutation of the integers from 1 to n such that vag) = «+++ = Vain) , then P; > Paco 
is a motion of G. 

Proof. If w; 2 w; and S is a winning set containing j but not i then replacing j 
by 7 in S yields another winning set, since it does not decrease the weight. If also 
v; 2 v; then 7 and j may be exchanged (by a motion leaving all other players fixed). 
Then the lemma follows by induction. 

For any majority game G of n + 1 2 2 players, we define an n-player court 
H,(G@) and an n-player game H2(G) as follows. It suffices to define games with 
ordered players, H with players Q, , ---, Q, , H’ with players Ri, ---, R, , specify- 
ing that H,(G) consists of the game H with the role of Q; distinguished and H.(G) 
is the game H’. Let (wo, ---, wn) be any system of weights for G, arranged so that 
Wo 2 --- 2 w,. We define H by its weights (w. — w,, we, ---, W,), and H’ by 
its weights (wo + wi, ---, Wn). 

Lemma 2. H,(G) and H.2(G) are determined by the game G; and together they deter- 
mine G. 

Proof. The first half is a routine analysis of definitions, and we omit it. For the 
converse, let G and G’ be majority games with H;(G) = H,(G’) = H, and H.(G) = 
H.(G’) = H,. Naturally order the player Q,, ---, Q, of the court H; and the 
players R, , ---, R, of the court H, . Since H; = H,(G), there exist weights wo = 
-++ 2 wp, for G and a one-to-one correspondence ¢ mapping the last n — 1 players 
P,, ---, P,, of G upon n — 1 of the players of H; so that the omitted player of H,; 
is a member of the distinguished role and the numbers wo — wu, We, «++, Wa, 
assigned according to ¢, are weights for H;. By Lemma | and the definition of a 
role, there is a motion a of H; such that a(¢(P;)) = Q; forj = 2, ---, n. By the 
same device we may regularize the correspondences of G to H2, G’ to H; and G’ 
to H. , so that players with like indices from 2 to n correspond. For any set S of 
indices in {0, 1, ---, n}, by considering the four possible values of S n {0, 1}, one 
can give necessary and sufficient conditions for {P; | 7 « S} to be a winning set in 
G, in terms of Hi; , Hz , and S. The same conditions are necessary and sufficient for 
{P;’ | ¢ « S} to win in G’. Therefore the correspondence P; <> P,’ is an isomorphism 
between G and G’. 

We call an n-player court H; and n-player majority game H, compatible if there 
exists a majority game G such that H,(G@) = HM; and H.(G) = H,. Evidently one 
can define a combinatorial scheme on the lines of a game, which must be the game 
G if G exists. The necessary and sufficient condition for compatibility is then the 
solvability of the system of linear inequalities defining weights for G, together with 
the axioms for a game. Reformulating, one has 

THEOREM 1. An n-player court H; with naturally ordered players Q; and an n-player 
majority game H; with naturally ordered players R; are compatible if and only if both 

(a) whenever {Q;| i « S} is a winning set in H, containing the player Q, , then 
{R;|t S andi 2} is a winning set in Hz , and 

(b) there exist weights u; for H, , v; for Hz , such that u; = v; fori = 2, ---, n. 
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Proof. The necessity of (a) and (b) is obvious. Conversely, suppose (a) and (b) 
are valid. We may replace each u; with max (u;, 0), and v; with max (v;, 0), 
without affecting either game (since players with negative weight are dummies) 
or the validity of (a) and (b). We claim the numbers (v1 + ™)/2, (v1 — m)/2, 
U2, ***, Un are weights for a game G with players Py , Pi, ---, P». Call the subsets 
of N = {Po,---, P,} having more than half the total weight winning sets. For 
any S C N, not both S and N-S are winning. One must win unless they have the 
same weight; but in that case it is clear that two complementary sets in H; or in 
Hz would have the same weight, which is impossible. To prove that any superset 
of a winning set wins, it suffices to adjoin players one at a time. Adjoining a player 
other than P; cannot decrease the weight. Next suppose S is a winning set contain- 
ing Py and not P; . Then the set {Q;|j = 1 or j 2 2 and P; ¢ S} is a winning set 
in H, , by computation; the set {R;|7 = 1 orj = 2 and P; « S} is a winning set 
in H; , by (a) and the fact that supersets of winning sets win in H», ; and Su {P;} 
wins in G, by computation. If S is a winning set containing neither Py nor P; , then 
N-S contains both but fails to win; hence N-S-{ P,} cannot win either, and S u {P,} 
is a winning set. Thus G is a game, and therefore a majority game. To show that 
H, = H,(G) and H, = H,(G) it remains to prove that the ordering (Po, --- , Pn) 
is natural. Replacing P; with Py) never turns a winning set into a losing set, since 
u, = 0. If S is a winning set containing P, and not P; , and furthermore S contains 
Py, then S u {P,;} — {P2} wins in virtue of (a). Then the same is true if Po is not 
in S, by passage to the complement. For i = 2, --- , m — 1, the natural ordering 
of the players of H; implies that replacing P;4: by P; never turns a winning set 
into a losing set. Therefore we may rearrange the given weights for G into weights 
Wo = +--+ = wn by successive transpositions, and the proof is complete. 

The author’s experience suggests that (b) of Theorem 1 may as well be ignored 
in computation; this is discussed in Section 4 below. 

Corouuary 1. For every n-player court H, there exists an n-player majority game 
compatible with Hy . 

In fact there exists an n-player majority game compatible with all n-player 
courts. One system of weights for it is (1, 0, --- , 0); but as we noted before, any 
weights may be assigned to the n — 1 dummies provided the first weight is large 
enough. 

Corouuary 2. Let f(n) be the number of different n-player majority games, g(n) 
the number of these which have no dummies. Then f(n + 1) = 2f(n) — land g(n + 1) 
= 2g(n) — 1. Forn > 6, f(n) > 2”, and for n > 7, g(n) > 2”. 

Proof. Every majority game has at least two roles, with the exception of the 
games having weights (1, 1, --- , 1), of which there is one for each odd number of 
players. In any case the number of n-player courts is at least 2f(n) — 1 and the 
number without dummies is at least 2g(n) — 1. Corollary 1 and Lemma 2 show 
that there are at least as many corresponding n + 1-player majority games; and 
clearly G cannot have any dummies when H;(G) has none. Then if f(n) > 2”, it 
follows that f(n + 1) > 2"**; and the list at the end of the paper shows that 
f(7) = 135 > 2". The same induction applies for g; and though g(7) is only 114, 
the list shows there are over 256 corresponding courts. 

The true order of f and g is not known. No respectable upper bound is known, 
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though one can make slight improvements on 2”, which is the number of sets of 
sets of players. (Of course, the lower bound 2” may be as wide of the mark.) 


3. Efficient sequences. In this section we shall consider only games without dum- 
mies; it is also convenient to reverse our convention as to ordering. Let a sequence 
be a finite non-decreasing sequence of positive integers. For any sequence w = 
(wi, Wn), let w(N) be the total w; ; for any subset S of N = {1, ---, 
let w(S) be 2; € s w; . Then w(S) is called the weight of S. 

Weights (w;) for a majority game are efficient provided w(N) is an odd number 
2p — 1 and every subset of weight greater than p contains a subset of weight ex- 
actly p. These conditions automatically make (w;) weights for a game, but the ab- 
sence of zeros does not imply the absence of dummies. (Example: 1, 2, 2, 2.) We 
are assuming there are no dummies; that is, every index is in some set of weight p. 
Then it is known [2, 3] that the efficient weights are uniquely determined by the 
game, namely as minimum integral weights, and that their growth is restricted by 
the inequality 


(+) w; 1+ 


For any sequence w, let us call the non-negative integer p acceptable for w pro- 
vided every set of weight >p contains a subset of weight exactly p. Clearly 0 is 
acceptable for each w, no integer from 1 to w, — 1 can be acceptable, a sum of 
acceptable numbers is acceptable, and all sufficiently large numbers are acceptable. 

Lemma 3. p > O is acceptable for (wi, --- ,Wn4:) if and only if both p and 
— Wns: are acceptable for (wi, --- , Wn). 

Proof. If p is acceptable for the long sequence it is clearly acceptable for a part 
of the sequence. If p — way: is not acceptable for (w:, --- , wa), one has a set S 
such that w(S) > p — way: and S contains no subset of weight exactly p — was: . 
Since the w; are increasing, one can assure that w(S) < p. Then Su {n + 1} can 
contain no subset of weight p, a contradiction. The converse is obvious. 

Now let us call p admissible for w if both p and w(N) + 1 — p are acceptable; 
p is proper if 1 S p S w(N). Let the sequence w be efficient provided it satisfies 
(*) and some proper p is admissible for w. 

THEOREM 2. A sequence w is an efficient sequence if and only if w is an initial 
segment of the (increasing) weights for some efficient weighted majority game. 

Proof. If w is an efficient sequence with proper admissible 


psp =w(N)+1-p, 


then p + p’ is acceptable, hence admissible, for (wi, --- , wa, p, p’), by four ap- 
plications of the lemma. To show that every index is in a set of weight p + p’ it 
suffices, by efficiency, to show that this is true of the least index. Suppose 7 is the 
least index actually used; if i > 1 then by (*), w; is < the sum of the preceding 
weights, and 7 can be replaced by a set of smaller indices without diminishing the 
weight, which reduces to a contradiction. 

For the converse, (*) is established in a previous paper [3], and the rest is clear 
from the lemma. Note that (*) assures that for n > 1, one of pand w(N) + 1 — p 
is >w, , so that subtracting w, leaves a proper integer. 
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THEOREM 3. Let (wi, --- , Wn) be efficient. Then +++, Wa, efficient 
is and only if (1) Wasi = Wa, (2) War S 1 + Dw, er (3) Wasi = h — k for 
some h and k which are admissible for (w; , --- , Wn). A proper p is admissible for 
(wi Wn+1) 


af and only if both p and p — Wn+: are admissible for (wi , --- Wn). 

Proof. The last statement follows from the lemma and the rest follows from that. 

Note that since 0 is always admissible, the possible values of w,4: always include 
the proper admissible numbers. There are not always at least two of these. However, 
it is known [3] that there are at least 2" * n-player games with efficient weights, for 
n = 4. From Theorem 3 and the proof of Theorem 2 we see that the number of 
efficient sequences of length n is an increasing function of n and lies between the 
numbers of efficient game weights of length n and of length n + 2. In the other 
direction we have 

TueroreM 4. For an efficient sequence (w , --- , Wn) there are at most n values of 
Wn41 which make (wi, --+ Wn41) efficient. Hence the number of efficient sequences of 
length n is between 2"~* and (n — 1)!. 

Proof. There are at most n acceptable integers s, between w, and w(N) + 1— wz, 
namely, Wn, Wn + Wn, ++: , w(N); for clearly if this sequence skips over p, 
8 <p < S41, then at step k + 1 we have a set of weight >p which drops below 
p if its smallest element is removed. Now w,+: itself need not be acceptable, but 
w(N) + 1 — way: is. For there are p and p’ acceptable for (w; , --- , w,) such that 
pt p = w(N) + way + 1. By the lemma, both p — way: and p’ — wry: are 
acceptable for (w; , --- , w,); hence so is their sum, which is w(N) + 1 — way. 
This establishes the inductive step, and a glance at the case n = 1 establishes the 
upper bound (n — 1)!. The lower bound 2”~ is established in [3] for n = 4, and 
the cases n = 1, 2, 3 are easily verified. 


4. The enumeration. A method for generating all majority games of n + 1 players 
from a list of all n-player majority games is as follows. List all the courst (H, r), 
H an n-player majority game and r a role of H. For each (H, r) there is at least 
one compatible game H,, as pointed out in Corollary 1 to Theorem 1. All compatible 
games must be determined. In some cases (e.g. when H; has weights (1, 0, --- , 0)) 
one can write down weights for the corresponding n + 1-player game G by rote. 
In general, however, one must screen the possible values of Hz and solve the sys- 
tems of linear inequalities defining weights for G. For hand computation it seems 
best to apply the necessary condition (a) of Theorem 1, and for each H, satisfying 
that condition, to attempt to solve the inequalities for weights for G with the side 
conditiont w; = 0. In a run such as the enumeration of the 7-player majority games 
one develops subroutines; for example, there are different courts H, , Hi’, such that 
Hz is compatible with H, if and only if Hz is compatible with Hy’, and one may 
make mental lists of the corresponding classes of games H» . 

It appears to be feasible though far from routine to code this method for automatic 
computation. No attempt has been made. For automatic computation it might be 
preferable simply to ignore the combinatorial conditions (a) and deal entirely in 
inequalities. However, (a) is a highly effective screen; for n < 6 every pair (H; , H2) 
satisfying (a) is compatible. 


al 
ar 
er 
N 
ti 
th 
ar 
ce 
ca 
fir 
co 
m 
WI 
id 
Ti 
ar 
m 
th 
= 
be 
N 
= 


- ON THE ENUMERATION OF MAJORITY GAMES 27 


The method expounded by von Neumann and Morgenstern [1] for enumerating 
all strong simple games involves, necessarily, a listing of winning sets (no other 
way is known for describing these games in general) ; and it also involves obtaining 
and then eliminating multiple isomorphic copies of each game. No other method is 
known for this problem. However, for generating a list of majority games, the pres- 
ent method is far superior. Note that if one were to employ the method of von 
Neumann and Morgenstern for this special purpose it would still be necessary to 
examine the inequalities determining whether each game has weights. 

The recursive enumeration of all efficient sequences involves additional informa- 
tion which is most easily generated from the beginning, the efficient sequence (w:), 
where w; = 1. One may lay out three columns on a sheet of paper, the first giving 
the sequence (w:, --- , Wn), the second the list of all integers from 0 to w(N) which 
are admissible for w, and the third the list of all differences of integers in the second 
column which satisfy the inequalities (1) and (2) of Theorem 3. The numbers in 
the third column are precisely the values of w,4: which can be used for an extension 
(wi, +++ ,Wn, Wai); Theorem 3 tells how to compute the second column for 
(wi, *** , Wn41), using the entries in the first column of this row and in the second 
column of the earlier row; then the new third column is computed from the new 
first and second columns. Coding this computation should be routine; the only 
complication is the irregularity of the size of the clumps of numbers. 

It would certainly be desirable to develop a method for the enumeration of all 
majority games which, like the method for efficient sequences, should deal directly 
with arithmetic properties of the weights. A prerequisite would seem to be a scheme 
for assigning unique weights (not necessarily integers) to each game. The obvious 
idea of taking minimal integral weights is not enough, because these are not unique. 
To see this, consider the two sets of weights (1, 3, 5, 6, 8, 11, 12, 23, 28, 31, 31, 38) 
and (1, 3, 5, 7, 8, 11, 12, 23, 28, 31, 31, 37). One may verify that these weights de- 
termine the same game. One then proves twelve lemmas which show that any 
weights for this game are at least as large as (1, 3, 5, 6, 8, 11, 12, 23, 28, 31, 31, 
37). First lemma: the first player is not a dummy; hence his weight in integers 
must be at least 1. The proof that the second weight must be at least 3 involves 
finding combinations of 1, 3, 11, and 12. Having all this, one observes that the 
twelve minimum values do not form weights for any game, since they permit a tie; 
there are no integers between them and the two sets of weights, and therefore both 
are minimal. 

It can be shown by similar arguments that the 135 sets of weights listed below 
are all minimum. (Criteria from a previous paper [3] aid in most of the verifica- 
tions.) In particular, no two of the 135 games are isomorphic. Comparable care has 
been taken to assure that no game is omitted. The list for n < 5 is taken from von 
Neumann and Morgenstern [1], and the list for n = 6 from Gurk and Isbell [2]. 


The 165 efficient weighted majority games of less than seven players 


1 11113 111114 111334 
111 11122 111123 112225 
1112 11223 111224 112335 
11111 111112 111233 
The 6 nonefficient weighted majority games of less than seven players 
111222 112234 122334 122345 
112223 122233 
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The 23 seven-player efficient weighted majority games 


1111111 1111223 1112226 1122227 
1111113 1111234 1112244 1122355 
1111115 1111335 1112336 1122557 
1111122 1111344 1112446 1123338 
1111124 1111445 1113337 1123558 
1111133 1112222 1113447 
The 91 seven-player nonefficient weighted majority games 

1111223 1123345 1223344 1234557 
1111333 1123356 1223346 1234568 
1112224 1123446 1223348 1234579 
1112233 1123457 1223357 1244567 
1112235 1133344 1223445 1244679 
1112334 1133355 1223447 2222333 
1112345 1133445 1223449 2223334 
1113335 1133456 1223456 2223345 
1113346 1133467 1223458 2223367 
1122223 1133557 1223467 2233344 
1122225 1133568 1223559 2233445 
1122234 1222233 1224457 2233456 
1122236 1222235 1224558 2233478 
1122245 1222334 1224569 2234455 
1122333 1222336 1233345 2234556 
1122335 1222345 1233446 2234567 
1122337 1222347 1233455 2234589 
1122344 1222356 1233457 2334456 
1122346 1222455 1233468 2344567 
1122445 1222556 1233556 2345678 
1122447 1222567 1233567 3334455 
1122456 1223335 1233578 3345567 
1123334 1223337 1234456 


1. J. Von Neumann & O. MorGenstTERN, Theory of Games and Economic Behavior, 
Penonten Univ. Press, New Jersey, 1944. 
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A Unified Process for the Evaluation of the Zeros 
of Polynomials over the Complex Number Field 


By John I. Derr 


Summary 
Newton’s method for the evaluation of the zeros of analytic functions is gen- 
eralized by the recurrence relation 


= 2; — (ky — i) f°? (2), 
where k; and 1; are determined so that ideally 1; + 1 = k; = k in the vicinity of a 


zero of order k. The process so defined is a second-order one and yields accurate 
approximations to zeros for all values of k, provided 1; + 1 = k; = k. 


1. Introduction. Consider the polynomial equation f(z) = 0 in a complex variable 
z, where f(z) has complex coefficients. To solve this equation by Newton’s method 
one uses the conventional recurrence relation 


(1) Zinn = 2; + (Az);, where (Az); = —f(z;)/f'(z;). 
Bodewig [1] has shown that Newton’s method is a second-order process if and only 


if the sequence {z,} converges to a simple zero of f(z). He has shown further that 
the recurrence relation 


(2) Zinn = 2; + (Az);, where (Az); = —k f(z;)/f'(z), 


defines a second-order process whenever {z,;} converges to a zero y of f(z) of multi- 
plicity k. 

The process defined by equation (2) leaves two problems unsolved: (i) In prac- 
tice k and y are usually not known,* (ii) The ultimate accuracy of the approxima- 
tion to y is limited by the fact that f(z,), f’(z;) — 0 as 2; y, provided k = 2. 

The unified process, which was developed to meet both of these problems, is 
characterized by the recurrence relation 


(3) = — (hee — Li) /f (zi), 


together with an explicit method for determining /; and k; which will be described 
in §2. Note that (3) follows from (2) if k; = k and f(z;) is replaced by f‘” (z,).t 
Hence the unified process is a second-order process provided 0 S 1; < k; = k. This 
justification of the unified process as a second-order process is the principal service 
provided by (2). 

Received 9 Apr., 1957; revised 21 July, 1958. 


*R.A. Brooker [2] describes a modification of (1) in which k is approximated by the 
smallest positive integer / such that 


| — (2) | < | flee — (ed 11, 


provided | f(zis1) | < | f(z) |. 

+ The author is indebted to the referee for pointing out this simple connection between 
(2) and (3). This connection has influenced considerably the presentation of the first two sec- 
tions. 
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The principal result of this paper is the formulation of a method for determining 
1; and k; in such a way that 1; — k — 1 in a certain sense as z; — y. In the stand- 
ard application of Newton’s method to a polynomial with multiple zeros the Eucli- 
dean algorithm would first be applied to eliminate the multiple zeros. (See [3].) 
This greatest common divisor process is, however, totally foreign to Newton’s 
method, whereas our determination of /; and k; is intimately related to the compu- 
tation of f“?(z;) and f“**(z;) in (3). This close connection motivated the choice 
of the term “unified process.” 


2. The Unified Process. Let f(z) = > ?-0cz', where n = 1, cn = 1, and ¢; is 
a complex number for z = 0, 1, ---,”. Throughout this section we shall assume 
that y is a zero of f(z) of multiplicity k. Thus we can write f(z) = (z — y)“q(z), 
where q(yv) 0. 


Lemma 1. 
f"(z) _ (2 — 2) 
f(z) k—m + — for m = 0, 1, ,k 1. 


Bodewig [1] obtains the same result for m = 0. The lemma follows then by ap- 
plication of that result with f(z) replaced by f(z). From Lemma 1 (with m = 1;) 
equation (3) becomes z;41 = z; — (k; — — y)/(k — + Of(z; — 
So if k; = k we have z;4, = y + O[(z; — y)’]. This result shows that equation (3) 
does define a second-order process when 0 S 1; < k; = k. The main application of 
Lemma 1 will occur below in the determination of k; . 

Suppose that z; is a “close approximation” to y. Let 7 be a fixed, sufficiently small 
positive number. Then /; is defined as the smallest non-negative integer such that 
(zs) | 2. 

Let us assume for the moment that /; > 0 if and only if k > 1. We will justify 
this assumption below in Lemma 2. Then Lemma 1, when applied for m = 1; and 
l; — 1 yields 


[Zar | (z;) = (k — 1) Fea +0 - 


The preceding equation suggests that we let k; = 1; + @ — 1 where ¢ is the closest 
integer to 


(2; yr" (2; ) 


We can now give an explicit description of how the unified process operates to 
extract an arbitrary and unknown zero of f(z). Let 7 and the starting value z, be 
given. Then z;4; is determined recursively from z; by means of equation (3), where 
for each 7, 1; is the first non-negative integer to satisfy | f“**”(z,) | = », and k;, is 
1 or 1; +  — 1 depending on whether /; is respectively zero or positive. To obtain 
#, round z as given in (4) to the nearest integer. 

We know that the unified process defines a second-order process if 


* 0(z) is a quantity which vanishes at least as fast as z, i.e., | 0(x)/z | is bounded as z — 0. 
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In looking ahead to the next section it is evident from accuracy considerations that 
in practice we would prefer that 1; = k; — 1 = k — 1; for note that then equa- 
tion (3) becomes 


(5) Zinn = — /f(z,), and f*(y) ¥ 0. 


The following lemma shows that the situation where 1; = k — 1 is always obtained 
as soon as 2; is close enough to y. 

LemMa 2. Suppose z; — y, where {z;} is determined by the unified process. Then 
there exist positive integers Ni, N2(Ni S Ne), such that 0 < 1; S k — 1 for 
N, and l; =k-—- 1 fori = N2. 

The proof of Lemma 2 follows from the continuity of f”(z) for m = 0,1, --- , k; 
the fact that f(y) ¥ 0 while f(y) = 0 for m = 0, 1, --- ,k& — 1; and finally 
the fact that 0 < m < 1; implies | f“” (z;) | < ». 

The following restriction on 7 follows immediately. 

Condition on n. Suppose {z;} is determined by the unified process. Then 2; — y 
implies that | f“(y) | = 1, where k is the multiplicity of y. 

Example 1. Suppose, for convenience, that » < .01. Let 


f(z) = (2-19 = 41. 


Let 2 = 2”. It follows that Az = —f/f’ = —(z — 1/z)/4 = —2/4 + O(1/z). 
Then z > 2”, and by induction z, > 2. 

Also, f(1 + 8) = 48° + 48° + &*, and f’(1 + 8) = 88 + 128° + 48°. Therefore, 
Az = —8/2 + 0(8&). It is easy to verify that 1 + » < zy, < 2 for some N,; > n. 
Thus f’(zy,) 2 7» and we have Jy, + 1 = 1 < k. However, we do indeed have 
= 

Next choose N2 = N; such that i > N2 implies 2 < 1+ 7°. Then i > N; im- 
plies | f’(z) | < | f’(1 + 9°) | < n/2. Therefore, 1; => 1. Since 


f"(1 +8) = 8 + 246 + 128°, 


it follows that 1; + 1 = k for all > N.. Thus N; and N; satisfy the conditions 
of Lemma 2. 

In summary the behavior of the process for large real z in this example is that 
which would be expected for a 4th order multiple root at the origin, for with 


f(z) =2 


we get f(z)/f'(z) = 2/4 and | f’(z) | > ; and we have Az = —z/4 + O(1/z). 
As 2; — 1, however, the behavior approaches that predicted by Lemma 1 near a 
double root at z = 1, ie., (2 — y)/(k — m) = (z — 1)/2, when m = 0. We 
have seen that Az = —(z — 1)/2 + O[(z — 1)’). 

In this section we have considered the unified process in an abstract setting in 
the sense that we have passed to the limit and employed complex numbers in the 
ordinary fashion. By so doing we have been able to present the basic facts about 
the process, especially as regards convergence, in a form unencumbered by the com- 
plications we shall encounter in the next section. There we shall introduce an ap- 
proximate convergence for a class of complex numbers whose real and imaginary 
parts have decimal expansions of bounded length. We can then indicate precisely 
how the second problem mentioned in §1 is relevant. 
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It is clear that we have already solved both of the problems set forth in §1 in the 
sense that if we have convergence at all, then the convergence is of the second order 
and eventually the situation in (5) holds. As to when the process does converge, 
we refer the reader to the existence theorems for Newton’s method given by Ostrow- 
ski [8]. However, it is not obvious how anything useful (in a practical sense) can 
result from a non-trivial adaptation of such theorems to the unified process. On 
the other hand see Brooker [2] for an example where Newton’s method does not 
converge. 


3. Numerical Application. From now on we consider only those complex num- 
bers whose real and imaginary parts contain p decimal digits. Consistent with our 
“rounded-off”’ numbers we now define the numerical convergence of truncated 
sequences {z;} of such numbers. Let ¢ > 0 be fixed and let M be a positive in- 
teger, which depends only on {z,}. We say that {z,} converges if there exists a 
positive integer N < M such that | (Az)y| zy |. We also assume without 
loss of generality that f(0) ~ 0, i.e., co ¥ 0. 

Situations producing non-convergence usually fall into one of two classes: (i) 2, 
occasionally lies in a sufficiently small neighborhood of a root of f’(z)/f(z). This 
occurs frequently, since by Lucas’ theorem any convex polygon containing all of 
the roots of f(z) contains all of the roots of f’(z). (See [5].) (ii) A subsequence of 
{z:} is eyelic. 

Class (ii) may occur due to round-off error, e.g., because of lack of precision in 
the coefficients of f(z) or because of multiple roots if k; < k. In addition there are 
bona fide situations like the one referred to at the end of §2, where accuracy con- 
siderations play no part. We employ a systematic method for selecting a new value 
for z: whenever non-convergence occurs. - 

We now modify our method for determining /; and k; so as to be consistent with 
our approximate arithmetic. We first define /; to be the smallest non-negative in- 
teger (if such exists) such that 


(i) | (2s) | 2 (0) | and 
(ii) | (2) | < | (0) |, whenever J; > 0. 


In the above expressions {q@m(z)} are the quotient polynomials which are obtained 
by dividing f(z) successively by z — z;. More precisely, define recursively 


f(z) = qolz) = (2 — 2:)qi(z) + f(z) 


and gm(z) = (2 — 2:)@m4i(Z) + Qm(2:) for m = 1, 2, --- ,1;. Note that in general 
f"(z:) = mlqm(z:). If no such 1; < n exists, replace z; by 2; — (Az);./2. 

Condition (ii) has been added to the original definition of 1; in order to help 
detect non-convergence class (i) and to separate closely bunched zeros. Example 2 
will show how logio | ¢x(0)/qm(z:) | gives a measure of the number of significant 
digits lost in the evaluation of f°” (z;). We also encounter in Example 2 a limiting 
situation where two zeros are just close enough so that condition (ii) fails to sepa- 
rate them. 

The determination of k; differs (when /; > 0) only by the manner in which @ is 
determined. As regards 7, let 5 be a fixed small positive number. Then 
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(i) Set @ = j if x is well-defined by (4) and there exists an integer j, 

such that |x — j| < 6. 

(ii) Set ¢ = 2 otherwise. 
Note that (ii) results in setting k; = 1; + 1. Reeall that k; = ¢ + 1; — 1 when 
l; > 0. 

As motivation for (i) we know that if y is a multiple root and z; is so close to y 
that 1; > 1, then x should be close to an integer which is bounded below by 2 and 


above by n — 1; + 1. When the corresponding smallest value of #—namely, ¢ = 2 
—is taken on, then /; + 1 = k; = k, while k; < n implies 


We make the assumption that case (ii) reflects a loss of significant digits in the 
computation of f“”(z;)/f“?(z;), due to the nearness of z; to a multiple root +. 
The assumption is biased toward the selection of multiple roots as Example 2 indi- 
cates. In the discussion of Example 2 a modification of (ii) is described which is 
biased toward the separation of roots. 

Example 2. Suppose 7 < .1 and consider 


f(z) = 2 — (24+ (14+) = (2-1) (2-1-7). 
Suppose zy = 1. Then q:(0) = —1 — n and f’(zvy) = Hence, 
| q(zw) | < q(0) | 


and ly > 0. Further, q.(0) = 1 implies /y = 1. 
Since f“"” (zy) = 0, = 1 and ky = ly + 1. It follows that (Az)y = »/2.* 
But go(1 + 7/2) = —7'/4 and f’(1 + /2) = 0. Therefore, ly, = 1 and 


(Az) = 0. 


As a result 1 + 7/2 is identified as a double root of f(z). Observe that the final 
step in the computation of go(1 + 7/2) is the addition of go(0) and —1 — » — 7/4. 

The example also shows that sequences {z;} can converge to numbers which differ 
from roots of f(z) by more than could be attributed to round-off error. In particular, 
the distinct roots 1 and 1 + 7 are treated as one double root equal to the mean 
value of 1 and 1 + 7. We shall indicate how this situation may generalize. By mak- 
ing use of the following lemma it suffices to produce a set of roots and a value of 
zy such that 1; = n — 1. 

Lemma 3. Let f(z) be an arbitrary polynomial of degree n having roots yj, --- , 
vn With multiplicities m , --- , m, , respectively. Then for every complex number 
z we have 


(zo) /f (zo) = 


In particular, if f(z) = (z — y)", then 2 — f” °(z0)/f"(zo) = y. Here we have a 
trivial case of “infinite order” of convergence, i.e., when 1; = n — 1, the problem 


* Note that if the determination of k;(l; > 0) were modified so that when | z — 1| < 4, 
k; = l; and 1; is decreased by 1, then we would have ly = 0 and (Az)y = 0. 
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of finding y under the unified process generalizes the trivial problem of solving a 
linear equation by Newton’s method. 

We know that for every 7 there exist polynomials such that distinct roots will be 
identified as multiple roots. Clearly, inorder to remedy this situation in any particu- 


lar case we must decrease /; and hence 7. However, we are at one side of a dilemma. 
For consider the case where y actually is a multiple root. Then if 7 is sufficiently 
small, we must have z; very close to y before /; can exceed 0—which means that as 


n is decreased, multiple roots tend more to be split into roots of lower multiplicities 
with their maximum accuracy limited to p/(k — k; + 1) digits. Recall from Ex- 
ample 1 that Az = —(z — 1)/2 + O[(z — 1)’] for z near 1. Thus z = 1 + 7 would 
satisfy our numerical convergence criterion if 7 < ¢ and be identified as a simple 
zero. 

The fact that z = 1 + 7 would be determined to be within ¢ distance of the dou- 
ble root at 1, although of the wrong multiplicity, would probably suffice for most 
practical cases. However, there do exist situations where the diagnosis of multi- 


1 
plicity is important in its own right, e.g., [ dx/~/ x(x — €) converges, whereas 


1 
[ dx/+/zx? does not. Moreover, in addition to such special cases, the best results 


are usually obtained when 1; + 1 = k; = k. 

We have observed that /; is a nondecreasing function of 7. Observe, however, 
that the approximate solutions obtained by the unified process do not depend con- 
tinuously on 9, since 1; and k; are integer-valued functions. We have also seen that 
an optimum value of 7 depends strongly upon the distribution of the roots near 
z;. Roughly speaking, if several distinct roots are bunched together, 7 should be 
“small.”* On the other hand, if only a multiple root-is near z;, then » should be 
“large.” For general calculations we recommend that » = Ve and 1/e < 10”. 


4. Computational Examples. In this section we shall extend the discussion of the 
preceding examples. In particular we shall record and interpret some solutions 
which were obtained on The RAND Corporation JOHNNIAC computer using 
nine-digit floating decimal (scientific notation) arithmetic (p = 9) with « = 10, 
and 6 = 10°. We illustrate the sensitivity of the process to 7 by considering » = 
10° and = 10%. 

Example 3. We consider Example 1 again, i.e., f(z) = 2‘ — 22" + 1. With » = 10“ 
the pairt of complex conjugate roots, 1 + i(.663098208-10~*), and the associated 
reduced polynomial, g(z) = 2° + 2z + 1, were first obtained. Note that in addi- 
tion to the fact that the double root +1 was approximated by a pair of simple 
roots, the modulus of the relative error substantially exceeds e. 

Recall that f(1 + @) ~ 46° and f’(1 + 6) ~ 86 as 6 — 0. Thus at 


z; = 1 + 7(.663098208-10*), 


we have f(z;) = —2-10°° and f’(z;) = i(.5-10°°). However, due to round-off er- 
ror the evaluation of f(z;) yields zero. Therefore, (Az); = 0, since | f’(z;) | > ». 


* In the next section we shall show that this is not always the case. 
+ The roots were actually obtained in succession. When a root is determined the reduced 
quotient polynomial q;(z) replaces f(z). 0 
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On the other hand, with 7 = 10 we obtained the double roots 
1 + 7(.000000015-10~) 
and —.999999998 + i(.753350000-10-*). Here the relative error is less than ¢ in 
either case. The first double root was obtained because now .5-10° < 7. 
Example 4. The polynomial in Example 2 is now generalized to the class of poly- 


nomials of the form z* — (2 + @)z + (1 + @). The results tabulated immediately 
below were obtained with » = 10™~. 


6 Roots | Error | 

1.09999996 — i(.000000001 -10~* ) .00000004 
1.00000004 — i(.490000000- 107°) 

01 1.00999900 — i(.000000001 -10~**) .000001 
1.00000100 + 

001 1.00099296 — i(.000000002-10~*”) 00000704 
1.00000704 + 

1.00005000 (double root) 00005 

.00001 1.00000500 (double root) .000005 


With » = 10° and everything else unchanged, the results obtained were the 
same except that for 6 = .001 we obtained 1.00050000 as a double root with an 
error of .0005 in magnitude. 

If z = 1 + A, we have f(z) = A(A — @) and f’(z) = 2A — 6. With 1; = 0 it is 
clear that as z,;—> 1 + 0, A; @ and 


Ai(@ — Ai) 
A; + (A; — 6) 


Such a situation can always be arranged if 7 < @, e.g., if 7 = 10 and @ = 10°. 
Under these circumstances {z;} converges numerically as soon as | @ — A;| < «.* 
However, since we are using nine-digit arithmetic, (Az); cannot be computed with 
enough accuracy to make the comparison with ¢ unless | @ — A;| 2 ¢/(10A,;); for 
otherwise | A;(A; — @) | has less than one significant digit. Therefore, 1/(10°@) 
yields a close estimate for the magnitude of relative error. The estimate accounts 
for the increase in error as 6 decreases from .1 to .001 for the results tabulated above. 

We conclude the discussion of Example 4 by making several remarks which have 
general implications. 

(i) When roots are bunched close together, the magnitude of relative error can 
exceed ¢ even when the multiplicity is correct. 

(ii) If the roots are sufficiently close together, then the relative error will be de- 
creased if the multiplicity is determined at certain values greater than the true 
multiplicity. In Example 4, 1 and 1 + 6 will be sufficiently close as soon as 


1/(10°0) = 0/2, i.e. = +/2-10~. 


(Az); = = (6 — A,) + — A,)’). 


5. Conclusion. Notwithstanding the tremendous effort which has been expended 
on the problem over the years, the recent appearance of the high-speed computer 


* The argument for the remainder of this section is not precise, but only as a result of sec- 
ondary considerations. 
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and the corresponding increase in size of the practical problems which can be at- 
tacked have created new demands for computational methods. For example, the 
methods advocated as late as 1952 by Olver [7], being non-iterative in character, are 
quite different from those which are considered here. One explanation for the differ- 
ence in emphasis might be that a computer is very efficient at executing repetitively 
a simple sequence of operations. For the unified process a great bulk of the computa- 
tion involves one basic operation, namely, synthetic division performed with com- 
plex numbers. When applied to a computer the unified process is flexible and auto- 
matic and can be further controlled by manipulating sensitive quantities such as 
n, 6, €, and z. 

For comparison purposes we consider two methods which have proved useful for 
digital computers. Under certain circumstances each method evaluates roots with 
about one-half the effort required by the unified process. However, both methods 
require special preparation in the case of multiple roots. 

Hitchcock’s method [4] requires that f(z) have real coefficients and utilizes the 
resulting decomposition of f(z) over the reals into linear and quadratic factors to 
avoid complex arithmetic in the synthetic divisions. Otherwise, the method is es- 
sentially. Newton’s method. 

Muller’s method does not require the evaluation of f’(z;). When the degree of 
f(z) is large, a saving of close to 50% results. The order of convergence of the proc- 
ess is 1.84 for simple roots and 1.23 for double roots. Apparently, no improvement 
over Newton’s method obtains for multiple roots whose order exceeds two. 

Most of our discussion remains valid for arbitrary analytic functions. §2 goes 
through intact, but the application described in §3 requires some modifications. 
Moreover, for the method to be useful in practice the class of functions considered 
should be restricted to those which have a non-empty, finite set of zeros and for 
which the derivatives are easy to compute. 
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Numerical Integration Formulas for use with 
Weight Functions x and z/V 1 — 


By R. E. Greenwood, P. D. M. Carnahan and J. W. Nolley 
1. Introduction. Numerical integration formulas of the type 


(1) [f@a~e, 


were investigated by Chebyshev [1]. It will be noted in correspondence (1) that 
all values of f used on the right hand side are given the same weight. This is ad- 
vantageous when the values of f represent experimental measurements. Salzer [2] 
has given tables of {x;,,} for n = 1, 2, 3, --- , 7 and n = 9. Other values of n re- 
quire values of 2;,, outside of the interval (—1, 1). 

Chebyshev also investigated numerical integration formulas of the type 


(2) los 


Greenwood and Danford [3] investigated numerical integration formulas of the 
type 


~C, f(xi,n). 


(3) [ 3 fain) 


These would appear to be of use when the first moment of a distribution is needed. 
The known usable cases for (3) are few in number, n = 1, 2, 3. Correspondence (4) 
may be used for m = 1, 2, 3, 4. 

The usual procedure for getting as much “accuracy” as possible into the cor- 
respondences is to require that they be exact for as many of the functions 1, z, 
a’, +++, +++ as possible. For correspondences (1), (2), (3), the requirement 
of equality for the function f(z) = 1 yields a value for C,, . Simple arguments sug- 
gest that if correspondences (1) and (3) are to be exact for polynomials of degree 
n, (i.e. linear combinations of 1, x, x’, --- , x"), then there must be n abscissae 
used. Correspondence (4) requires a different approach, one such approach using 
continued fractions was suggested by Chebyshev [1]. Since the same abscissa value 
is used twice in correspondence (4), simple arguments suggest that correspondence 
(4) can be made exact for polynomials of degree 2m. 


2. Weight Function x/+/1 — x*. Numerical integration formulas of the type 


(5) f(x) dz = Ke f(- — Xi, m)] 


laws 


Received March 14, 1958. 
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TaBLeE 1. Weights and Abscissae for x/ V1—2z 


m=1 k, = 0.9068 9968 2, = 0.8660 2540 8, = 60° 0’ 0” 
m = 2 ike = 0.4767 8538 x, = 0.6691 3061 & = 42° 0’ 0” 
x2 = 0.9781 4760 8 = 78° 0’ 0” 
ake = 0.7714 5495 x, = 0.1045 2846 
2 = 0.9135 4546 82 = 66° 0’ 0” 

m=3 tks = 0.3233 1355 z,=0 7888 8; = 33° 56’ 37.54” 

a2 = 0.8915 0871 82 = 63° 3’ 48.41” 

x3 = 0.9793 2755 83 = 78° 19’ 46.67” 

ok; = 0.4369 1949 2, = 0.0754 1781 & = 4° 19’ 30.83” 

2 = 0.7439 8645 82 = 48° 4’ 19.62” 

2x3 = 0.9781 7670 83 = 78° 0’ 28.88” 

sks = 0.6522 1071 a, = —0.1530 9943 8, = — 8° 48’ 23.71” 

t2 = 0.4156 7465 82 = 24° 33’ 42.52” 

z3 = 0.9416 3401 8; = 70° 19’ 40.09” 

sks = 0.9613 1418 a, = —0.7896 4766 8, = —52° 9’ 9.35” 

2 = 0.6842 8078 82 = 43° 10’ 44.65” 

23 = 0.9223 7153 83; = 67° 16’ 31.02” 

m=4 ik, = 0.2445 2340 2; = 0.4869 8884 8, = 29° 8’ 34.29” 

= 0.8177 1937 82 = 54° 51’ 25.71” 

x3 = 0.9073 5870 83 = 65° 8’ 34.29” 

= 0.9998 8810 = 89° 34.29” 

ok, = 0.3049 1569 a, = 0.0598 0415 8: = 3° 25’ 42.86” 

a2 = 0.6351 1577 82 = 39° 25’ 42.86” 

a3 = 0.8943 7741 83 = 63° 25’ 42.86” 

4 = 0.9864 9059 8 = 80° 34’ 17.14” 

sk, = 0.3956 4717 za, = —0.1193 9422 8; = —6° 51’ 25.71” 

2 = 0.3232 0966 82 = 18° 51’ 25.71” 

x; = 0.8001 3355 83 = 53° 8’ 34.29” 

x, = 0.9811 4838 8 = 78° 51’ 25.71” 

ak, = 0.4933 6396 2, = —0.5383 5061 8; = —32° 34’ 17.14” 

2 = 0.4606 4244 82 = 27° 25’ 42.86” 

x3 = 0.7017 9790 83 = 44° 34’ 17.14” 

at, = 0.9678 3475 8, = 75° 25’ 42.86” 

sk, = 0.5494 3910 a, = —0.3792 2546 8; = —22° 17’ 8.57” 

22 = 0.2370 8038 82 = 13° 42’ 51.43” 

a3 = 0.6117 2430 83 = 37° 51.43” 

a, = 0.9598 7524 - 8s, = 73° 42’ 51.43” 

ck, = 0.8890 1113 2, = —0.8506 8007 8: = —58° 17’ 8.57” 

= 0.0299 1547 = 1° 42’ 51.43” 

23 = 0.7628 2957 83 = 49° 42’ 51.43” 

= 0.9413 8647 = 70° 17’ 8.57” 


were studied by one of the co-authors of this paper in connection with the prepara- 
tion of his master’s thesis [4]. The substitution z = sin s changes correspondence 
(5) to the form 


2/2 m 
(6) (sim 8) ds ~ Uf (sin 81m) — f 


A fairly sophisticated method for evaluation of the z;,,, was used by Chebyshev 
[1]; that method was followed by the authors of this paper. 

All calculations were performed on desk machines and carried to ten decimal 
places and then rounded off to eight decimal places. The results obtained are tabu- 
lated in Table 1. All values of 2;,, and s;,,, are tabulated with the corresponding 
value of k,, and hence with the corresponding value of m. In some cases more than 
one value of k,, was found. In some places certain subscripts not needed for clarity 
have been dropped. When more than one value of k was found a prescript was 
added to k» . 

The similarities in the s;,, values in the minutes and seconds columns is rather 


odd. 
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TaBLE 2. Weights and Abscissae for x* 


n=1 c = 0.66666 66667 zx, = 0.00000 00000 

n=2 c = 0.33333 33333 a, = 0.77459 66692 = —z, 

n=3 c = 0.22222 22222 a, = 0.94868 32981 = —z; 
= 0 

n=A4 c = 0.16666 66667 a = 0.92836 49436 = —2z, 
= 0.58149 68029 = 

n= 6 e = 0.11111 11111 = 0.94100 74697 = —z, 
= 0.81334 08269 = 
= 0.25298 16413 = 


The derivation suggests that the numerical integration correspondence relation 
(5) should be exact for a polynomial of degree 2m + 1 or less. Indeed, since the 
trivial identity 0 = 0 results when f(x) is chosen as an even function, one can con- 
sider the set of m + 1 functions fi(x) = 2, fs(x) = 2°, , fomaa(z) = and 
require that correspondence (5) be exact for this set of m + 1 functions. Thus the 
set of equations for 7 = 0, 1, 2, --- ,m 


\ 


(7) dz = | at | 


2j+1 

= 2m [= | 

could have been used to determine the parameters k,, and x;,m, 7% = 1, 2, ---, m. 
However, this set of simultaneous non-linear equations is much more difficult to 
solve than the Chebyshev equations. Equations (7) may be used as a check set, 
and indeed the data of table 1 were checked by this method. Except for slight dis- 
crepancies of three or less units in the eighth decimal place the check equations 
above were verified. Such discrepancies are well within the possible round-off error. 


3. Weight Function x*. Numerical integration formulas of the type 
1 n 
(8) [ de ~ fain) 


were recently studied by another co-author of this paper in connection with the 
preparation of his master’s thesis [5]. Again, a method suggested by Chebyshev [1] 
was used to outline the steps in the computation. Chebyshev’s method required that 
the roots of an nth degree polynomial be obtained. A previously prepared program 
for determining roots of a polynomial was used on an automatic digital computer 
to get seven decimal place values for the roots. Hand methods and Newton’s itera- 
tion were then used to get ten or eleven decimal place values. These values were 
rounded off to ten places, and are given in Table 2. 

The polynomials for n’= 5, 7, 8 had at least one pair of imaginary roots. Nu- 
merical quadrature formulas using imaginary abscissas are of doubtful utility and 
hence the roots are not tabulated for these values of n. 

The derivation suggests that the numerical integration correspondence relation 
(8) should be exact for a polynomial of degree n or less. Consider, then, the set of 
n + 1 functions fo(x) = 1, fi(x) = 2, fo(x) = 2°, --- ,fa(z) = x”, and require that 
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correspondence (8) be exact for each of these (n + 1) functions. The resulting 
equation for fo(z) = 1 determines the value C, = 2/(3n). The other equations 
are of the type 


(9) [ dt = Cy 
j = 1,2, --- , n. With the value of C, given above, these n nonlinear simultaneous 
equations could have been used to determine the n abscissas 2 , %2,---, %,. In 
practice, equations (9) were used as a check set on the values obtained following 
the Chebyshev procedure. Except for slight discrepancies of two or less units in the 
tenth decimal place the check equations were all verified. Such discrepancies are 
well within the possible round-off error. 


4. Conclusions. Comparison between the Newton-Cotes rule for numerical 
integration of (2/+/1 — x?) f(x) on (—1, 1) and the Chebyshev method presented 
in 2 for f(x) suggest that the Chebyshev method is faster and “more accurate” 
for a reasonably wide class of functions f(x). Of course, this wide class of functions 
does not include f(z) = (./(1 — 2*)/zx) (polynomial in x) for which the Newton- 
Cotes method could be expected to be faster and more accurate. 

Similar statements could be made when the weight function x” is used. Indeed, 
these numerical integration methods apply best when the value of the integral 


[, ware) ae 


is desired and when f(x) is given empirically and where w(x) is the given weight 
function. Thus, the abscissas given in table 2 would be quite useful in empirical 
determinations of second moments. 
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Two Theorems on Inverses of Finite Segments 
of the Generalized Hilbert Matrix 


By Richard B. Smith 
1. Introduction. A quick check on the inverse S, of finite segments of the 
generalized Hilbert matrix H, = (hij), hij = 1/(p +i+ 9 — 1),7,j = 1,2,---, 


n, p # —1, —2,--- , —(2n — 1), may be made by summation of the n’ elements 
of the inverse. The summation requires complete accuracy. 


2. Inversion of the Matrix. The inverse of this matrix has been derived by 
Savage and Lukacs [1] for p = 0 and by Collar [2] for nonnegative integer values of 
p. By using the method employed by Savage and Lukacs which is based on a formula 
in [4], the element in the 7th row and jth column of S, is 


pPtitj—1L@— — Kj jp! 
where p ¥ —1, —2,--- , —(2n — 1). 


(1) Si = 


3. Summation Identity. Let binomial coefficients of the form C.” = 0 for s < 0 
and s > r. Then 


(2) = (-1)"Cs 
follows from formula (27) in [5].* 


4. Theorem I. Let Si? be defined by (1), then 


(3) = n(p + 2) 
j= 
where p ¥ —1, —2,---, —(2n — 1). 
When n = 1 or 2 equation (3) is easily verified. By assuming (3) for n it re- 
mains to be shown that 


n+1 
an = (n+ 1)(p+n+1) 
=n(p+n)+p+2n+1, or 
(4) Se — = pt 


where p # —1, —2,---, — [2(n + 1) — 1]. By substitution of (1) for Si,, and 
S in (4) we have 


Received March 4, 1957; revised April 7, 1958. 
* The author is indebted to the referee for this reference. 
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ptitj—-1@— iim — — 7)! 


nt+1 n—1 


i IT (+i +k) 
+ 2(—1) (-1) G—Din+1-)D! 


Il 
+ (p+ 2n+1)| = (p+ 2n + 1) 


Let i — 1 = j and this becomes 
(5) (p+ 2m |- 
Take the sum from (5) as the coefficient of x?*’ and consider 
n (p +j + k) n j D™ 


where D = d/dzx. Differentiating as x”-x**" we have 


G+n)! 
OM 


n i-—1 n 
n i (j +n)! 


Let x = 1. Then 


and considering (2) we have f(1) = (—1)*. Thus (5) becomes p + 2n + 1. 
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5. Theorem II. Let Si? be defined by (1), then 


> ij si ji 
1] (6) 2, Sn (j 


where j = 1, 2,---,nand p ¥ —1, —2,--- , —(2n — 1). 
The proof is similar to the one given for Theorem I. 
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Sequential Minimax Search for a Zero Pr 
of a Convex Function 


By O. Gross and S. M. Johnson 


1. Introduction. Many methods have been proposed and used for the computa- ot 
tion of real roots of equations in a single real variable. Notable among these is 
Newton’s Tangent Method. However, this procedure is quite costly in many in- 
stances wherein the function of which the zero is desired is inordinately compli- 
cated; indeed, it is usually the case—polynomials and other simple functions 
excepted—that derivatives are more complicated than the functions themselves. Y 
This last circumstance constitutes the principal motivation of later methods, such 


m 
as the one given in [1], that are based solely upon evaluations of the functions in 
themselves. To our knowledge, none of these later methods are sequential minimax 
in character except for the bisection procedure, and this only in a restricted sense. e 
Additional motivation for our method is provided by the observation that once 
a root is located on a sufficiently small interval by a positive and negative reading, p 
the function will be inclined to be convex or concave throughout the interval, at pr 
least for typical analytic functions. 
2. Description of the Procedure. In what follows, we shall describe a numerical 
procedure for solving the following problem: | 
“We know initially a positive and a negative value of a function at two given 3 


points. The function is continuous and convex and is otherwise unknown but com- 
putable. Given an integer n > 0, how do we locate its root within an interval of P 
minimum length in n steps, where a step consists in calculating the value of the 
function at any point we choose?” The question has no definite answer until we 


specify, for example, that our procedure be sequential minimax; i.e., at each step . 
of the procedure we assume that the worst possible situation might occur from that . 
point on in the light of our present information about the function, and proceed to ; 
evaluate the function at a point that hedges against all contingencies so as to ' 
guarantee no more than a fixed interval length at that stage. " 


We now describe the procedure cycle: 
Suppose we are in the situation in which we know 


f(a) = Y.>0 and f(b) = —¥,,¥,>0, where a <b; 


further, the root is greater than S, where S is known and we have n more readings 


to make. 
Then we have bracketed the root on the interval (S, W), where 
W=a + (b a) Y.+ 


If n = 0, the computation ceases and the values S, W are recorded. 


Received July 10, 1957. 
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If n > 0, calculate the value of f at z = 8 + (W — S) p,(Y,/Y.). (Graphs of 
pn for n = 1, 2, 3, 4 are included in Fig. 5 at the end of this paper.) 
If f(z) = Y. > 0, set a’ = z, b’ = band 


(x — a)Y./(Y. — Ys). 


If, however, f(z) = —Y. < 0, seta’ = a,b’ = z, and if Y. > Y, set S’ = S; 
otherwise, set 


S’ = max (S, x — (b — x)Y./(¥s — Y.)). 


Finally, set n’ = n — 1. 

Then we ere in the situation in which we know f(a’) = Y. > 0, f(b’) = —Yvw, 
Yy > 0, where a’ < 6’; we know that the root is greater than S’; and we have n’ 
more readings to make. This completes the cycle. (As the problem is stated, we 
initially have S = a.) 

In the next section, we shall illustrate how this procedure works on a particular 
example. 

Remark 1. The foregoing procedure is an approximation to the actual minimax 
procedure. The theoretically correct procedure involves replacing the expression 
pn( Y/Y.) in the formula for x above by 


7) 
a’ 
where p,(S, Y) is defined in Sec. 4. Since the function p,(S, Y) is relatively in- 


sensitive to S in our choice of p,(S, Y) near the minimax, we feel that the ap- 
proximation 


pn(0, Y) pa(S, 


is justified, and define p,(Y) = p,(0, Y). This approximation renders the pro- 
cedure more adaptable to machine computation. 

Remark 2. Since the average digital computer has difficulty in reading graphs, 
in order to program the procedure one would first find suitable algebraic approxi- 
mations to the graphs of p, after the fashion of C. Hastings, Jr. [3]. 


3. A Numerical Application—Comparison with the Bisection Technique. Sup- 
pose we want to bracket a zero of a certain function f defined over the interval 
(0, 1). We know that f is continuous and convex, and that f(0) = 1, f(1) = —1. 
As a simple example, let f(x) be given by f(z) = max (—1, (x — 1/3)(z/2 — 3)). 
We intend to make three evaluations of the function. Referring to the graph of 
R;(0, Y) in Fig. 6, with Y = Y/Y: = 1, we see that we can guarantee locating 
the root on an interval of length 0.01 times the length of the original interval. How- 
ever, since the graphs represent the worst that can happen, we expect to do much 
better. 

We proceed to calculate: 

Cyciz No. 1. We have a = 0, b = 1, Y. = 1, Ys = 1, S = 0, = 3, whence 
by our formula we obtain W = 0.5, and we have located the root on (0, 0.5). Next, 
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we compute zt = 0 + (0.5 — 0)p;(1) = 0.5(0.148) = 0.074, and we find that 
f(0.074) = 0.76839 > 0, so that a’ = 0.074, b’ = 1, and 
0.074(0.76839) _ 
Finally, n’ = 2. 
Cyctz No. 2. Dropping primes on the new variables, we have a = 0.074, b = 1, 
Y, = 0.76839, Y, = 1, S = 0.31950, n = 2, whence by our formula we obtain 


0.926(0.76839) _ 
W = 0.074 + —T 76839 0.47636, 
and we have located the root on (0.31950, 0.47636). Next, we compute 
1 
xz = 0.31950 + 0.15686p2 (crass) = 0.31950 + (0.15686)(0.198) = 0.35066, 


and we find that {(0.35066) = —0.04795 < 0, so that a’ = 0.074, b’ = 0.35066; 
and since Y, < Y;, we have 


(1 — 0.35066) (0.04795) 
1 — 0.04795 


S’ = max { 0.31950, 0.35066 — ) = 0.31950. 


Finally, n’ = 1. 
Cyrcie No. 3. Now we have a = 0.074, b = 0.35066, Y. = 0.76839, Y, = 
0.04795, S = 0.31950, n = 1, whence by our formula we get 


(0.35066 — 0.074) (0.76839) 
0.76839 + 0.04795 


and we have located the root on (0.31950, 0.33441). Next, we compute 


= 0.31950 + (0.33441 — 0.31950) 


0.76839 
= 0.31950 + (0.01491) p:(0.624) = 0.32440, 


and we find that (0.32440) = 0.02535 > 0, so that a’ = 0.32440, b’ = 0.35066, 
and 


W = 0.074 + 


= 0.33441, 


(0.32440 — 0.074) (0.02535) 


4 = 2440 
+ 0.76839 — 0.02535 


= 0.33294. 


Finally, n’ = 0. 
Cycuz No. 4. Finally, we have a = 0.32440, b = 0.35066, Y. = 0.02535, Y, = 
0.04795, S = 0.33294, n = 0, whence by our formula we get 


(0.35066 — 0.32440) (0.02535) 
0.02535 + 0.04795 


and we have located the root on (0.33294, 0.33348). Now n = 0, so the computa- 
tion ceases and the interval (0.33294, 0.33348) is recorded. 

Remark. Using the bisection technique, in 3 readings we would find the bracket- 
ing intervals of lengths 1, 0.5, 0.25, 0.125. Taking convexity into account, by our 


W = 0.32440 + 


= 0.33348, 
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Fig. 1 Fie. 2 


method we obtain intervals of lengths 0.5, 0.15686, 0.01491, 0.00054, an obvious 
improvement. 


4. Derivation of the Functional Equation. Given a continuous convex function f 
and two of its values, one positive and the other negative, (say f(z:) > 0 and 
f(a.) < 0, with 2; < x2), how should one search for the zero of the function that 
lies on the known interval (x; , x2)? Using the principle of optimality of the theory 
of dynamic programming [2], we formulate the problem in terms of minimizing 
the maximum length of interval on which we can deduce that the zero is located 
in n readings taken sequentially. 

First, by reduction to scale, we can always consider the diagram shown in Fig. 
1, where f(0) = 1, f(1) = —Y, Y > 0, W = 1/(1 + Y). Since f is convex, the 
zero must lie on (0, W). If we have just one more reading to make (n = 1), then 
we choose* x on (0, W) and calculate f(x). It can be shown by simple dominance 
arguments that no reading of f need ever be taken outside any interval on which 
the zero has been located. Having chosen x and calculated f(z), we encounter 
exactly one of the followlng cases (barring f(z) = 0, of course, the best possible 
case). 

Case 1. If f(x) = V > 0, then by drawing lines joining known points on the 
graph of f, we have the picture shown in Fig. 2, with the root located on [S, W"I, 
as implied by the convexity of f. 

Cask 2. If f(x) = —V’ < 0, the picture is shown in Fig. 3, with the root located 
on (max (0, S’), W”) as indicated. 

Let S” = max (0, S’). If 2 is the final point at which f(x) is to be determined 
(i.e., nm = 1) then it is chosen so as to minimize the maximum possible value of 
max (W’ — S, W” — 8S”) consistent with our choice. 

Let us consider the general n-stage process in which we have several more read- 
ings to make. In either of the two cases diagrammed above, all the essential data 
can be described by a basic triangle determined by a two-parameter system as 
follows. 

In Fig. 2, we can conclude that the graph of f lies above the line segment VS 
and below the segment VW’Y (where the letters stand for both points and values 


* The particular optimal choice will be derived in what follows. 
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Fig. 3 Fig. 4 
in the obvious manner). If we now draw the line segment SY, it may or may not be 
crossed by the graph of f; but if it is crossed at some point P, say, we can replace 
that portion of the curve lying below PY by PY. This device does not violate the 
convexity condition nor does it exclude the worst possible case. This can be shown 
by simple dominance arguments based on our knowledge of the function at any 
particular stage. 

The essential data can thus be described by the triangle VSY. Since a vertical 
reduction in scale leaves the problem invariant, and a horizontal reduction leaves 
it relatively invariant, the triangle VSY can be replaced by a triangle of standard 
form described pictorially by two parameters Y, S (say), as shown in Fig. 4. 

Similarly, in the second case (Fig. 3), the graph of f lies above S’V’ and below 
1W”V’. Draw the line segment from 1 to S’ and replace any portion of the graph 
of f lying below this line by the line itself from 1 to the peint of crossing. This does 
not affect the choice of subsequent values of x, since they will all be chosen on 
minimal bracketing intervals. Again, this can be shown by simple dominance argu- 
ments. Thus, in the second case also by a suitable reduction to scale we are led to 
another representation of Fig. 4. 


Now define 
R,(S, Y) = the minimum length of interval on which we can guarantee locating 
the zero in (0, 1] of any convex function f, given that f(0) = 1, f(1) = -—Y <0, 


the root is > S, and we have n readings to make. 
If n = 0, we clearly have 


RAS, Y) = S. 


1+ Y 
Next, using the principle of optimality, and taking into account the scale factors, 
we obtain for n > 0 the following recurrence relation: 


x 


— V’ 
max | > VV’ 
0s <¥ (z—8)/(1—S) - V’) ) 


with the upper and lower expressions after the brace corresponding to the first and 


R,(S,Y) = min 


S<2<1/(1+Y) 
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second cases, respectively. The scale factors are obtained in a completely elemen- 
tary manner by means of similar triangles. 

The ranges of the variables S, Y above are given by Y = 0,0 < S< 1/(1+ Y). 
To render the expressions more amenable to machine computation, we make the 
following substitutions: 


Ww W) =R,(S,Y), whence R,(S, Y) 7): 


1+ 
An additional modification of the quantified variables V, V’ finally reduces the 
system to do(S, W) = W — S, 


max ¢ 1¢ wii t) ) 
— 
where0 SSW 1. 

The functions ¢,(S, W) are then computed for n = 1, 2, 3, 4 by means of a 
discrete approximation using various grid sizes and linear interpolation. 

The minimizing values of x are recorded and these form the basis for our optimal 
policy; i.e., z* = x,*(S, W) is the point at which we evaluate our unknown function 
f given that the root lies on (8S, W) in our basic triangle and there are n evaluations 
to be made. The point z,* is, of course, itself measured on the basic triangle. To 
take care of the general situation, we must relate our readings to the original scale. 
If we let p,(S, Y) denote the fraction of the distance between S and W occupied 
by z,*, we readily obtain 


pn( S, Y) = — S 


where W = 1/(1+ Y).Itisnowa relatively easy matter to relate x,* to the original 
scale and thus to obtain the procedure cycle outlined in Sec. 2. 

Graphs of the functions R,(0, Y) and p,(0, Y) forn = 1, 2, 3, 4 are included 
at the end of this paper (Figs. 5 and 6). 


5. Remarks. We shall close with a few remarks intended primarily to validate 
certain assumptions made, tacitly or otherwise, in the derivation of the functional 
equation. 

1. Suppose we are at a certain stage in an optimal sequential minimax search. 
We have computed several values of f and are ready to choose our next point of 
evaluation. Let a, b denote the closest evaluation points on the left and right, re- 
spectively, of the minimal interval (S, W) on which the root is known to lie. Then 
in an optimal procedure, we choose x on (a, b). To see this, we need only state that 
since the unknown function f may indeed be linear both to the left of a and to the 
right of b (a possible contingency), it is easy to see that any such subsequent read- 
ing would in such a contingency afford us no information regarding the character 
of f within (a, b) pertinent to the location of its zero, not already implied by the 
quantities a, b, f(a), f(b), S and W, or indeed by any of our previous readings. 
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(S,Y) Py (0,Y)= (Y) 
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Similar but slightly more involved arguments can be given to show that the next 
reading should be taken on the interval (S, W). 

2. In the treatment of case 2, it was tacitly assumed that Y, < Y;ie., the S’Y 
line has a negative slope as shown in Fig. 3. Again, it can be shown by dominance 
arguments that the worst situation occurs when f is monotone, and indeed when 
the graph of f lies above the line SY. It is this condition that determines the limits 
of the quantified variable V in the upper line of the functional equation for R, . 
It is precisely such dominance considerations as these that, though enabling us to 
express the functional equation in a relatively simple form and to obtain an optimal 
“policy” therefrom via its recursive computation, prevent us from obtaining an 
optimal “procedure” directly from the equation. However, all the contingencies, 
together with the information about the function f they afford, are taken care of 
in the procedure outlined in Sec. 2. 

3. The function R,(S, Y) is separately decreasing in S and Y. To see this for 
the first variable, for example, upon recalling the definition of R,(S, Y), we need 
only observe that the information that the root is > S includes the information that 
the root is > S’ if S’ < S. On the basis of the additional information, then, we can 
clearly guarantee at least as short a final bracketing interval with a larger S value 
if we are proceeding optimally; i.e., we have R,(S, Y) S R,(S’, Y) if S > S’. 
An analogous argument applies for the second variable. 

4. Finally, we note that an analytic treatment of the case n = 1 provides a 
check on the machine computations. 


Rand Corporation, Santa Monica, California 


1. H. Surexps, “Determination of zeros of polynomials and real roots of trans- 
cendental equations,’”’ Report R-77, Instrumentation Lab., Mass. Inst. of Tech, May 1954. 

2. RicHarp BELLMAN, “The theory of dynamic programming,”’ Amer. Math. Soc., Bull., 
v. 60, November, 1954, p. 503-515. 

3. Creciz Hastines, Jr., Approximations for Digital Computers, Princeton University 
Press, Princeton, New Jersey, 1 


E 
7 
¥ 


On The Error Propagation in Adams’ 
Extrapolation Method 


By B. Zondek and J. W. Sheldon 


The error propagation in step by step integration methods is governed by the 
zeros of certain polynomials [1]. These zeros, \> , --- , An , are the error multipliers, 
i.e. an initial error propagates through the integration like 


(1) Nr = + ary’ + + 


where r is the step number and ap, --- , a, are constants. In this theory we regard 
the differential equation as being locally linear with constant coefficient. The Adams 
extrapolation method [2] of order n + 1 for a differential equation 


(2) y’ = f(x,y) 
is written symbolically 
(3) = Yr + h (Bo + AiV + BaV” 
where h is the step size. This leads to the equation 

1 n 
for the error multipliers Xo , , An; 
(8) - 

dy 

is considered constant in our approximation. The coefficients 8) , 6: , --- are gen- 


erated by the power series expansion 

(6) 
and are all positive. The first few are 


(7) Bo 1, 3, Be Ps, Bs = %, Bs #33 


For sufficiently small | k | (or | h |) one of the multipliers, say \> , has a larger 
absolute value than all the others, 4, --- , A», [3], and an initial error propagates 
about like aodo’. In fact it can be shown that for small | k | one root of equation 
(4) is 


(8) ho = + O(k"**), 
while the others are 
(9) O((kBn)"’"), 1, yn. 


Consequently, \o causes an error propagation about like that of the differential 


Received August 21, 1956. 
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equation (2) and we may estimate the accumulated error from the variational 
differential equation and the local errors. 

When we increase | h |, keeping n fixed, the multiplier \» differs more and more 
from e*. (This has been considered in detail for the Runge-Kutta method in [4], 
where the notion of fidelity is used.) Furthermore, the “small’’ multipliers, \; , --- , 
dn , may be expected to increase according to equation (9) and perhaps dominate 
the “‘natural”’ multiplier, \> . In the latter case there is danger of extraneous oscilla- 
tions appearing in the integration. 

When we increase n, keeping h fixed, the multiplier \) approaches e* according 
to equation (8) (provided | h | is not too large). On the other hand, from equation 
(9) and the fact that lim (8,)'"" = 1, we may expect the “small” multipliers to 
become more significant and perhaps to cause extraneous oscillations. 

We now state a lemma that will give us some information about the multipliers 
when | k | is not small. 

Lemma. Let 


(ii) k be real, 

(iii) k 0, 

(iv) k= —1, 

(v) +2 +2" |} <2+k; 
then h(\) has either one simple zero or no zero outside the unit circle, according 
ask > O or k < 0 respectively. 

Proor. Let = — 1 — kandg(A) = a(1 — 1/A) + --- + a,(1 — 1/A)”. 
Thus h(A) = f(A) — kg(A). Now let R be a real number R > 1. Then, for || = R, 
Min | f(A) | 2 | R — 1 — k|, and this increases without bounds in R. Further- 
more, for |A| = R, 


Max | | +5) } 


{lal2+--- 2°}, 


and this is bounded in R. Therefore there exists a real number R,; > 1 such that 
| f(\) | > | kg(A) | for |X| 2 R,. It is clear from this that h(A) has no zeros on 
and outside the circle | \ | = R; . To prove the lemma, it thus remains to show that 
f(X) and h(\) have the same number of zeros in the annulus 1 < \ S R, . Now by 
Rouché’s theorem [5] (applied here to a doubly connected domain), if | f(A) | > 
| kg(\) | on the boundary, f(A) and h(A) have the same number of zeros in the 
domain. The boundary consists of the two circles |\ | = R; and || = 1. Now R,; 
was chosen so that | f(A) | > | kg(A) | on | A| = R,. So it remains to demonstrate 
this inequality only on the circle | \ | = 1. 
On the unit circle ; 


|a,(1 — 1/A) + + — 1/d)"| 


| k 
| = k 
la 


<|k| 


=65. 
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Now write r, = |A — 1| and r, = | — 1 — k|. Then it is easily shown that for 
d on the unit circle and k real r. = {(1 + k)r, + k}"”, and therefore 


where 7; ranges from 0 to 2. Let us find the maximum of 6(r;) in this range. Differ- 
entiating we get 


d 
dr; 


ay | + 2k? | a2 | ri + 3k" | as | + (4K? | | 
=|k| + (1 +k) | ae|) +--+ + (mn +4) Lan | 
{(1 + k) + ’ 


and, since k #0 and k= —1 by conditions (iii), (iv) of the lemma, 
(d/dr;)6(r1) > 0, and therefore 5(r:), being continuous in the range, has its maxi- 
mum at r; = 2. This maximum is 


and by condition (v) of the lemma 6(2) < 1, so that on the unit circle | kg/f | < 
6(2) < 1, and the lemma is proved. 

Now we apply this lemma to the error propagation in Adams’ extrapolation 
method. There are two cases, k > 0 andk < 0. 

If k > 0, we are integrating in the direction of natural error increase (e* > 1). 
Condition (v) of the lemma becomes 


(10) + + 2"%B,) <1. 


This is a sufficient condition that there be only one multiplier outside the unit circle 
(and this must be the natural one). Consequently, if | k |"** « e (see equation (8)), 
the error propagation is approximately governed by the solution of the variational 
differential equation. We see, for example, that the 2nd order method (n = 1) 
always satisfies condition (10). 

If k < 0, we are integrating in the direction of natural error decrease (e" < 1). 
Condition (v) of the lemma becomes 


(11) —Kk(1 + 26): + + 2"B,) <2 


This is a sufficient condition for all multipliers to be within the unit circle. More- 
over if k has its critical value 


—2 
1+ --- + 2%, 


there is a multiplier —1. (This is seen by substitution into equation (4).) When 
k is smaller than its critical value, we may expect extraneous oscillations with in- 
creasing amplitude. Accordingly, if we want the error propagation to be approxi- 
mately governed by the variational differential equation, it is certainly necessary 
for condition (11) to be satisfied (in addition, of course, to the fidelity condition 


(12) k 
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|k|"*? « e*.) We get, for example, for the 4th order method (n = 3) the critical 
value k = —-35. We also note that the critical value of k tends to zero as the order 
is increased because pt 2’B, is a divergent series. 

In practice, for n not too small, condition (11) is almost necessary and sufficient 
for the natural multiplier, 4» = e* + 0 (k”**), to dominate the other multipliers. 

When f(z, y) is complicated, the labor of integration is proportional to 1/h. If 
we minimize this, subject to given truncation error and one of the conditions (10) 
or (11), we get optimum values of h and n. This type of optimization is discussed 
for orbit integrations in [6]. 


Naval Proving Ground, gy, Virginia and 
Computer Usage Company, New York, New York 


1. L. Cottatz, Numerische Behandlung von Differentialgleichungen, 2nd edition, Springer, 
Berlin, 1955, p. 106 and p. 128. References to several papers on the subject are given there. 
. CoLLATz, Numerische Behandlung von Differentialgleichungen, 2nd edition, Springer, 
Berlin, 1955, p. 79. 
3. H. RUTISHAUSER “Ueber Instabilitaet von Methoden zur Integration gewoehnlicher Dif- 
pratiaiaaiaman eit. f. ang. Math. u. Phys., v. 3, 1952, p. 65-72. 
4. Harry L. Reev, Numerical Integration of Oscillatory Ballistic Research Lab- 
oratory Report No. 957, October 1955. 
5. Z. Newart, Conformal Mapping, McGraw-Hill Book Co., Inc., New York, 1952, p. 131. 
6. J. W. SHELDON, B. ZonpEKk & M. FrrepMan, ‘On the time step to be used for computa- 
tion of orbits by numerical integration,’’ M TAC, v. XI, 1957, p. 181-189. 
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Reviews and Descriptions of Tables and Books 


1[A, B, C, D, F].—Tsunera Yano, Yano’s Tables of Calculation, The Tsuneta Yano 
Memorial Society, Dai-ichi-Seimei-Kan, Yurakucho, Tokyo, Japan, 1952, vi + 
162 p., 18 em. Price $2.50 + postage. 


This handbook for computation contains tables of multiplication and division, 
reciprocals, powers and roots, trigonometrical functions, factoring, testing pri- 
mality, logarithms, antilogarithms, cologarithms, conversion between common 
and natural logarithms, lists of formulas, and a conversion table of weights and 
measures. In addition to a preface written by his son, roughly half the book is de- 
voted to an explanation of the functions and the use of the tables. 


IRENE A. STEGUN 
National Bureau of Standards 
Washington, D. C 


2[B].—ARNE OLANDER, 9°”, Elementa, v. 41, 1958, p. 202. (Swedish). 


The author reports that there are exactly 3696 93100 significant decimal digits 
in this “largest number constructible conventionally with three digits”; the first 
twenty-two and last ten of these digits are given. 


EprrortaL Note: More extensive information regarding this number has appeared in 
MTAC, vol. II, p. 93-94, 224-225 (Notes 54 and 66, respectively). 


3[F]._D. H. Ler usr, “Tables concerning the distribution of primes up to 37 
millions,” 1957, 17 p., 844” x 11”, mimeographed. Deposited in the UMT 
Fre. 


These interesting tables concern the distribution functions r(x), (x) and 
Let +(x) be the number of primes < x. This is listed for x = 10°(10°)37-10° 


together with li(x) = [ (log «)~* dx and the difference and ratio (to 5D) of these 


functions. The difference ranges from 122 at 2-10° up to 630 at 37-10°, and the 
ratio ranges from 1.00166 at 10° down to 1.00014 at 31-10°. 

Let 2,(x) be the number of primes p < z for which p + k is also a prime (thus 
m2(x) is the number of pairs of twin primes < 2x, x + 2). This is listed for x = 
10°(10°)37-10° and k = 2(2)70 together with an appropriate multiple (depending 


on k) of li(x) = I (log x)~? dx (ef. [1]). The ratio (to 4D) is listed for « = 10°, 
2 


5-10°(5-10°)35-10° for each k. Also listed (to 4D) is m(x)/m(x) for x = 10°, 
5-10°(5-10°)30-10° for each k together with the expected (rational) ratio. 

Let m2,.(x) be the number of primes p S 2 for which p + 2 and p + k are also 
primes. This is listed for x = 10°(10°)37-10° and k = 6(2)72 with 3 + k + 2, 


together with a multiple (depending on k) of li;(x) = [ (log x)~* dx. The ratio 


of these functions and the ratio m2,.(2)/2,6(2) are listed as in the case of (x). 
56 
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Finally there is listed the least prime p, with given difference A = prs: — pn for 
A = 2(2)156, 160, 164, 170, 180, 182, 210. The 23 primes whose differences exceed 
those of all smaller primes are starred, difference 180 occurring for 170 51707 and 
difference 210 occurring for 208 31323. These entries extend the earlier list given by 
A. E. Western [2]. 


J. L. SELFRIDGE 
IBM Research 
Yorktown Heights, New York 


.G. H. Harpy & J. E. Lirruewoop, ‘Some problems of partitio numerorum III,’’ Acta 
Math., v. 44, 1923, p. 1-70. 
2. A. E. “Note on the magnitude of the difference between successive primes,” 
London Math. ,Jn., v. 9, 1934, p. 276-278. 


4{L].—NBS Applied Mathematics Series, No. 51, Tables of the Exponential Integral for 
Complex Arguments, U. 8. Government Printing Office, Washington, D. C., 
1958, xiv + 634 p., 26.8 cm. Price $4.50. 
The exponential integral used in this volume i is the function E,(z) defined by 
the integral 
E,(z) = (e“/u) du 


taken along any path from u = z = x + iytou = ~ + 0% avoiding a cut along 
the negative real axis; on the branch cut itself E,(z) has been defined for z = —x <0 
as —Ei(x) — im, where, with the usual interpretation as a principal value, 


Ei(z) = (e"/u) du. 


The function £;(z) has a logarithmic singularity at the origin, E,(z) + log. z being 
equal to the sum of a power series in z. 

Over 80 per cent of the tables relate to the first quadrant, the remainder to the 
second. As is remarked in the Introduction, more work remains to be done to bring 
the tabulation in the second quadrant up to the state of that in the first. The fact 
that so massive a volume does not exhaust the tabulation of one single function well 
illustrates the difficulty of making a thoroughly adequate table of a function of a 
complex variable. As is especially desirable in the case of tables with complex argu- 
ment, the Introduction explains methods of interpolation, both for the analytic 
functions tabulated and for the non-analytic function e*E,(z) given in Table III. 

As is usual with this series, the volume is a remarkably good value. It appears 
from the Preface that much the greater part of the work of computation was done 
by the New York organization under A. N. Lowan, presumably a decade or so ago. 

Details of the functions tabulated follow. There are no differences. 


Table I (p. 2-479). Ei(z), 6D, x = 0(.02)4, y = 0(.02)3(.05) 10. 
Table II (p. 482-503). Ei(z) + log. z, 6D, x = 0(.02)1, y = 0(.02)1. 
Table III (p. 506-587). e*E,(z), 6D, « = 4(.1)10, y = 0(.05)10. 
Table IV (p. 590-605). E,(z), 10D, —x = 0(.1)3.1, y = 0(.1)3.1; 
E,(z) + log. z, 10D, —z + 0(.1)1, y = 0(.1)1. 
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Table V (p. 608-633). e°E,(z), 6D, x = 0(1)20, y = 0(1)20; 
—zx = 0(.5)4.5, y = 0(.1)4(.5)10; 
—x = 4.5(.5)10, y = 0(.5)10; 
—x = 0(1)20, y = 0(1)20. 


Supplementary Table (p. 634). e*, 10D, x = 4(.05)10. 
A. F. 


S[L].—Axapem1a SSSR, Tablitsy integral’not pokazatel’not funkisii, (Acad. 
Sci, USSR, Tables of exponential integrals.) Izdatel’stvo Akad. Nauk SSSR 
(Press of Acad. Sci, USSR), Moscow, 1954, 301 p., 27 em. Price 27.75 rubles. 


This volume belongs to the Mathematical Tables series of the Institute of Exact 
Mechanics and Computational Techniques. 

The functions tabulated in the main table (pages 12-299) are the usual ex- 
ponential integrals 


Ei(z) = [ e't* dt, —Ei(—z) = [ve | 
the first integral being a principal value. Both integrals are tabulated to strictly 
7S for x = 0(.0001)1.3(.001)3(.0005)10(.1)15. Leaving aside the single page of 
values for x = 10 (where no differences are given), only on 21 pages is it necessary 
to provide either second differences for individual arguments or mean second differ- 
ences for each column; over much the greater part of the table, linear interpola- 
tion suffices. Except for alternate arguments between x = 3 and x = 10, where the 
interval was halved by the Russian calculations, values to more figures may be 
found in NBS publications [1]. The Russian tables are said to be based on pre- 
liminary verification and correction of the NBS tables; the corrections are not 
stated, and it seems unlikely that many were found to be necessary. 

On page 11 are tabulated the auxiliary functions Hi(x) — In x and —Ei(—zx) + 
In x to 7D for x — 0(.0001).0099. 

On page 300, and also on a loose card, are values of 3¢(1 — ¢) to 4D for t = 
0(.001)1. Second-difference interpolation (where needed) is also facilitated by a 
nomogram on another loose card. 

A. F. 


1. NBS MATHEMATICAL TABLES, Tables of Sine, Cosine and ——— Integrals, v. 1, 
MT5; v. 2, MT6, U. 8. Government Printing Office, Washington, D. C., 1940. 


6{L].—L. N. Karmazina, Tablitsy Polinomov Iakobi (Tables of Jacobi Polynomials), 
Akad. Nauk, USSR, Moscow, 1954, 250 p., 26 em. Price 27.9 rubles. 


This book gives the coefficients, roots, and values of the Jacobi Polynomials 
G,(p, q, x) form = 1(1)5; p = 1.1(0.1)3.0; ¢ = 0.1(0.1)1.0; z = 0(0.01)1.00. The 
Legendre polynomials, which are the special case p = 1, g = 1, are also given sepa- 
rately, with the same arguments in and x. The values and roots are given to 7D; 
the coefficients to 7S. No comparable set of tables is known to the reviewer. 
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The polynomials G, tabulated here can be expressed as 


-1 
G,(p, q, 2) = {(" (27 — 1) 
in terms of the polynomials P,, of Szeg6 [1], chapter 4. Note that the polynomials 
of this table differ by a normalizing factor from the polynomials G,(p, q, x) of 
Courant-Hilbert [2], p. 76. The polynomials of the table under review may be char- 
acterized by the properties that they are orthogonal on [0, 1] with weight function 
z* (1 — x)” *, and are normalized so that the coefficient of the term of highest de- 
gree is one. 

The tables were calculated at the Experimental Computation Laboratory of 
the Institute for Exact Mechanics and Computational Techniques of the USSR 
Academy of Sciences. There is a short introduction (in Russian) and four graphs 
showing the qualitative behavior of various families of the polynomials. The book 
is well printed, on good paper, and the numbers are easy to read. No differences 
are given, but the introduction has a section describing the accuracy obtainable 
from linear, quadratic, and cubic interpolation’ in p, g, and x. Nine recurrence 
formulas and two asymptotic formulas are also given to aid in extending the tables 
in n, p, and q. 

According to the introduction, the tables were calculated for each p, g, and n 
by successively applying a Gaussian difference formula in z, starting with x = 0. 
The value obtained at z = 1 was then checked against an easily derived independent 
formula for G,(p, g, 1). The resulting tables were then checked by differencing. 
No mention was made of the type of machine used in the calculations. 

In glancing through the table, one idiosyncrasy is immediately apparent. In 
many cases, the values of G, are simple terminating fractions; in these cases, this 
table often leaves us with a string of four or five nines. For example G,(3.0, 0.1, 
0.02) is given as —0.005 0000, while G,(3.0, 0.1, 0.03) is given as +0.004 9999. 
The positive values are given consistently in this way, and the internal evidence 
leads the reviewer to conjecture that the computation was performed to 8 or 9D 
on some type of tabulator with “nines complement” arithmetic in the counters, 
and the values were simply truncated to 7D, with no attempt to round anything off. 

To check the accuracy of the tables, the reviewer coded a program on the Univac 
Scientific model 1103 digital computer at the University of Minnesota Scientific 
Computing Laboratory, which calculated the coefficients and values of G, to 10D, 
for values of n, p, g, and x chosen “randomly” throughout the table by using a 
machine-generated sequence of pseudo-random numbers. The roots were not 
checked. In checking 100 values obtained in this way, no gross error was found, but 
the rounding conjecture above was substantiated. In 96 cases, the value given in 
the table was the truncated (unrounded) 7D approximation to the true value. 
The largest discrepancy found was certainly not serious: for G;(2.8, 0.1, 0.75) the 
true value rounds to —0.001 865 205, and the value given in the table is —0.001 
865 1. The claim given in the introduction that the errors in the values and roots 
do not exceed two units in the seventh decimal place and the errors in the co- 
efficients do not exceed one unit is probably correct, except for the ever-present 
spectre of the typographical error. There is one such error in the reviewer’s copy 
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on p. 13, the title page to the table of values; the exponent j of —1 is missing in the 
formula. 

To sum up, this book will be extremely useful to anyone who has a need for 
Jacobi polynomials; the fact that the book is in Russian need deter no one, since 
98 per cent of it is arabic numerals. The only criticism the reviewer has is a matter 
of esthetics; it is annoying to see .1249999 when 4 is meant, particularly when it 
is so easy to round off, even with the most primitive of computational facilities. 


Davin A. 


University of Minnesota 
Minneapolis, Minnesota 


me G. SzEG6, Orthogonal Polynomials, Amer. Math. Soc., Colloquium Publications, v. 23, 
1 


. 2. R. Courant & D. HiLBert, Methoden der Mathematischen Physik, v. 1, Springer, Berlin, 
1931. 


7[(W, X].—S. I. Gass, Linear Programming: Methods and Applications, McGraw- 
Hill Book Co., New York, 1958, 223 pages. Price $6.50. 


A topic in pure mathematics has suddenly become applied, and so much so 
that classrooms are filled with non-mathematicians learning about linear program- 
ming. This is a rather thorough text book for a class of non-mathematicians—and 
provides welcome motivation for the mathematician. This book is taken from the 
author’s notes for a one-semester, three-hour course, and, hence, should serve very 
well as a text. Figures, examples, and exercises are excellently delineated. 

The subject matter of linear programming can be divided into three areas: 
mathematical concepts, computational techniques and algorithms, and applica- 
tions. The author interlaces all three, always keeping in mind the fact that the 
reader is not a mathematician but one who desires to fake advantage of linear pro- 
gramming in his own discipline. One chapter is devoted to the necessary mathe- 
matics of linear inequalities and convex sets. This provides a complete working 
background, even if lacking rigor (see, for example, the definition of a determinant 
on page 14). 

The major portion of the book is given to theoretical and computational methods 
of solving the so-called general linear programming problem. As expected, a chapter 
is devoted to the simplex procedure, its justification and geometric interpretation. 
This is built upon the preceding chapter, which defines the problem and gives the 
fundamental theorems regarding existence and characteristics of solutions. Duality 
problems are next discussed, with emphasis upon the implications of symmetry. 
The next two chapters cover the revised simplex procedure and degeneracy 
problems. 

The author obviously has had considerable experience solving linear problems 
on large electronic computers. Welcome comments allow the reader to benefit from 
this experience. An example is a discussion minimizing the importance of degeneracy 
techniques to practical problems and setting these procedures in a proper context. 
One chapter is devoted to preliminary analysis and computational devices which 
may be used to minimize computational work. Included is a survey of available 
digital computer codes. (We cannot pass the opportunity to note the error in the 
subtitle on page 130: ‘‘Available Digital-computer Coeds.’’) 
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The third part of the book deals with applications and includes one chapter on 
the transportation problem. One of the strong points of the book is a description 
by category of many typical applications. The categories include inter-industry 
problems, diet problems, industrial applications (by industry), economic analysis, 
military applications, and scheduling problems. An excellent attempt has been 
made to give the reader a grasp of the wide applicability of linear programming. 

The final chapter discusses the connection between linear programming and the 
theory of games. This is really independent of the rest of the book, but serves well 
to tie the main topic to other mathematical areas. It seems too bad that a little 
more has not been said about non-linear problems. Clearly, this is not the subject 
of the book but would serve to better place linear programming in the greater 
concept of optimizing problems. 

An excellent bibliography is included. 

E. C. Smrru, Jr. 


International Business Machines Corporation 
Los Angeles, California 


8[X].—L. Fox. The use and construction of mathematical tables. National Physical 
Lab., Math. Tables, v. 1, Her Majesty’s Stationery Office, London, England, 
1956, 60 p., 27.5 em. Price 17s.6d.net. 


The Mathematics Division of the National Physical Laboratory was formed 
in 1945 and part of its mission is the construction of new mathematical tables and 
the checking and extension of existing tables. In order to make the results of this 
work available to the public, a Mathematical Tables Series is planned. The present 


volume is a general introduction to the series, which will usually carry less funda- 
mental tables than those which are in the volumes prepared for the Royal Society 
Mathematical Tables Committee, and in which the tabular material will be printed 
by photographic processes. The booklet, therefore, is addressed to the consumer 
of fairly special functions and will not be found easy reading by comparative 
novices. In order to avoid repetition, in this first volume, standard notation and 
standard methods of interpolation and table making are described so that future 
volumes, e.g. 

2. L. Fox, Tables of Everett Interpolation Coefficients 

3. G. F. Miller, Tables of Generalized Exponential Integrals 
need only discuss special devices appropriate to the particular function. 

This booklet is divided into three chapters, which we shall discuss separately: 
A. The use of mathematical tables; B. Derivation of formulae and analysis of 
error; C. The construction of mathematical tables. A list of references is given for 
A and B, and for C. 

A. The use of mathematical tables. It is pointed out that volumes in the series 
will make use of auxiliary functions where these are convenient, and that changes 
in the dependent or independent variable will also be usual. Examples are given 
to show what we may expect. 

Standard methods for direct univariate interpolation are discussed in some 
detail, and are compared. 

i) Reduced derivatives, i.e., Taylor’s Series. 
ii) Bessel and Everett formula, with and without throw-backs of various kinds. 
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iii) Lagrangian methods. 
iv) Chebyshev polynomials. 

The first three methods are familiar. We describe the fourth briefly. We define the 
Chebyshev polynomials by 


T,(z) 


cos are cos 2), 


n = 0,1, 2, --- 


Since the leading coefficient of 7,(2) does not vanish, it is possible to express x” as 
a linear combination of T,, Tn-2, Tn-4, --- . Accordingly, if we have an appro- 
priate interpolating polynomial 


fo = L(p) = a + ap + ap + --- +a,.p"" 


we can rearrange it as 
Sp = L(p) = bo + iTi(p) + beT2(p) + + ba Tra(p) 


In virtue of the minimum deviation property of the T,(x) in —1 S x S 1, it is 
possible that an adequate approximation will be obtained by fewer terms of the 
latter series and we may have 


Sp = In(p) = bo + iT i(p) + beT2(p) + + bmaTma(p), m <r. 


We now expand the T',(p) as polynomials and collect terms to find an “economized 
polynomial” 


fo = Li(p) =o + + 


Tabulation of the co, c,, C2, +--+ , Cm—1 Will enable interpolation to be done more 
conveniently than the tabulation of the ao, a; , a2, -*- , a1. Various devices are 
used to improve the convenience of this method. For instance, ¢ is near fy and it 
is possible to use an expression 


fp Lal) = fo + dip + dip’ + 


There is a short section in inverse interpolation. The two-machine balancing 
method [1] is recommended when one is working in a table with differences. In case 
one is working in a table in which the coefficients of the (economized) interpolating 
polynomial are given, the problem is immediately that of the solution of an algebraic 
equation: the use of Newton’s method is suggested. 

There follows a discussion of methods of obtaining the first and second deriva- 
tives, and the integral of a tabulated function, which are appropriate when par- 
ticular means of interpolation have been provided. 

The final section of the first chapter deals with bivariate interpolation. Generally, 
the use of the Everett formulae would appear to be most satisfactory [2]. 

Throughout this chapter, there are compact worked examples, and statements 
of maximum errors, under various limitations. 

B. Derivation of formulae and analysis of error. This chapter carries the justifi- 
cation of various material in the first chapter. 

For instance, there is a detailed derivation of the coefficients a;, b; for the 
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simultaneous throw back of the sixth and higher differences into the second and 
fourth: 


Smif = Of + + ad'f + --- 
= + + + --- 
for use in an Everett formula 
So = (1 — p)fo + + fo + + Esbn'fo + 


Then there is a thorough discussion of the error incurred in the use of the above 
formula, or of the related ones: 


fo = LQ — p)fo + phi + fo + + Mey‘fo + 


where 


= — 0.1845n'f, M,= 1000(E,+0.184E:), N,= 1000(F,+0.184F,), 
= 5n'f/1000; 
fo = (1 — p)fo + phi + fo + Fain fi + + 


where P, = M,/10, Q, = N,/10, 5,‘f = 5n‘f/100. It is concluded that the last 
arrangement is the most satisfactory and in order that this method should be 
convenient, the second volume in the series will contain a table of EF, , F, to 9D with 
P,Q, to 5D, at an interval of .0001 in p. 

There follows an elaborate examination of the process of economization, in- 
cluding an explanation of its relation with the ‘throw-back’. It follows from the 
Chebyshev theory, if x(p) is the polynomial of degree n which satisfies x(0) = f(0), 
(1) = f(1) and is the best (Chebyshev) approximation to f(p) in 0 S p < 1, 
then e(p) = w(p) — f(p) has in (0, 1], » interior extrema, of the same absolute 
value, but alternating sign. This property of e(p) is also possessed by 


Tns1{(2p — 1) cos (x/2n + 2)}. 


It is reasonable, therefore, to consider a representation of an approximating poly- 
nomial in the form: 


= + 2> a.T{(2p — 1) cos (x/2i)}. 


It can be shown that the Bessel coefficients can be expressed as linear combinations 
of the T{(2p — 1) cos (x/2z)}, and so the Bessel expansion can be rearranged in 
the above form. We thus obtain a Bessel-Chebyshev interpolation formula: 


fo = (1 — p)fo + phi + Wamp fie + fie + + 


where the @, are, apart from a normalizing factor, T,{(2p — 1) cos (#/2n)} and 
where the coefficients up "fy, p "fie can be represented as (infinite) series in the 
central differences of f. These formulae are “among the most efficient of all for- 
mulae”. They have, however, some drawbacks, e.g., they involve differences of all 
orders. It is, however, possible to obtain from them an Everett-Bessel-Chebyshev 
formula in which only the even differences are involved. It is also possible to obtain 
directly, following Kopal, an Everett-Chebyshev formula: 


fo = la + + vs(q)o* +--+ + + + + --- jh, 
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where g = 1 — p, where pon4i(z) is a constant multiple of T2.4:{p cos (/4n + 2)} 
and where each o” is a series involving even central difference operators. 

The relative merits of the three schemes are examined and it is concluded that, 
in practice, the Everett-Bessel-Chebyshev scheme is preferable. 

C. The construction of mathematical tables. The author now states, correctly, 
that almost any technique in numerical analysis can be used in the production of 
the entries of a table. Accordingly he proceeds to give what is a rather scrappy 
syllabus for an introductory course in numerical analysis. Most of this chapter 
seems entirely out of place in the present booklet. It is a pity that an opportunity 
to give some account of the classical methods of British table makers has been 
missed. 

Joun Topp 


California Institute of Technology 
Pasadena, California. 


1. See e.g. Her Majesty’s Nautical Almanac Office, Interpolation and Allied Tables, Her Ma- 
jesty’s Stationery Office, London, England, 1956. 

2. Compare, T. H. Sournarp, “‘Everett’s Formula for Bivariate Interpolation and Throw- 
back of Fourth Differences,”” MTAC, v. 10, 1956, p. 216-223. 


9[X].—I. P. Naranson, Konstruktive Funktionentheorie, translated by K. Bogel, 
Akademie Verlag, Berlin, 1955. xiv + 515 pp. Price 36 DM. 


The term “constructive theory of functions” is due toS. N. Bernstein. The sub- 
ject derives from the work of Chebyshev and his pupils, for instance, Korkine, 
Zolotareff and the brothers Markoff, and was set up as an independent discipline 
by Bernstein, to whom and to whose pupils much of its later development is due 
[1, 2]. Basically the theory is concerned with the approximate representation of 
functions in terms of simpler ones: it is therefore likely to be of considerable prac- 
tical value but it is also an intrinsically interesting area of mathematics. 

To indicate the content of the subject, we shall mention some of the results 
which are easy to present. The book is divided into three parts: I. Uniform approxi- 
mation; II. Approximation in the mean; III. Interpolation and approximate 
quadrature. 

I. The first part is concerned with the approximation of functions f(x), con- 
tinuous in a finite interval [a,b] by polynomials, and that of continuous periodic 
functions F'(6), by trigonometrical polynomials. The fundamental results are Weier- 
strass’ Approximation Theorems. Of the many proofs for the first of these, it is 
natural to choose that of Bernstein. What is proved is the following: 


If k) = (7) — x)"*f(k/n), then Bu(f, x) f(x), uniformly in 


[0,1]. 
Of the many proofs for the corresponding result for F(@) that of de la Vallée Poussin 
is chosen. It is: 
(2n) 2n 
If ValF, = cos" 406 — 8) thenV F(0), 
a(2n 1) !! 

uniformly for all 
The actual equivalence of these two theorems is demonstrated. 

The possibility of approximation established, the question of best approxima- 
tion arises. This problem was solved in principle by Chebyshev, who gave a quali- 
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tative characterization of the polynomial of best approximation to f(z). A clear 
account of the Chebyshev theory is presented. The properties of the Chebyshev 
polynomials T,,(2) = cos (n areccos x) are discussed in detail. It is noted, for ex- 
ample, that T,,(x), which is characterized, among polynomials of degree n and 
leading coefficient unity, as that which has minimum deviation from zero in [—1, 1], 
has also the largest deviation from zero in | «| > 1. Specifically we have: 

If x,(x) is a polynomial of degree not exceedingn, andif M = max_s<2<1| |, 
then for any real —,|£| > 1, we have | x,(€) | < M| T,(€) |. 

The next question to be discussed is how various local properties of f(x), e.g. 
the behavior of its modulus of continuity, or its differential coefficients, affect the 
degree of approximation which is possible. In the opposite direction, there is a 
study of how the degree of approximation which is possible restricts the function. 
Let 


E,(f) = min max | f(t) — pa(z) | 


where the maximum is over all x, a S x S b, and the minimum over all polynomials 
pn(x) of degree n, and define Z,(F') similarly. Then, in virtue of the Weierstrass 
Theorems, E,(f) — 0, E,.(F) — 0. It might be expected that the rate of convergence 
of {E,} increases as the behavior of the function improves. This is indeed the case. 
For instance, Dunham Jackson showed that 


E,(F) 120(n) 
where w(h) is the modulus of continuity of F, that is, 
w(h) = max | F(x) — |, 
while Bernstein showed that a necessary and sufficient condition for F(x) to satisfy 
a Lipschitz condition of order a, 0 < a < 1, namely, 
| F(x) — F(é)| M constant, 
is that 


| £,(F)| A constant. 
In this area there belongs the theorems of the brothers Markoff. We quote that 
of A. A. Markoff. 
If x,(x) is a polynomial of degree at most n such that 


| w(x) | $1, 
then we have 
| (x) | <n’, 


That this result is a best possible one is shown by the case of T,(x): it is easy to 
verify that 


| | = 


The latter chapters of the first part are concerned with the best approximation 
to functions which are analytic on a segment of the real axis, and with the behavior 
of the Fourier-series and its Fejér and dela Vallée Poussin sums as approximations 
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to periodic F(@) and that of the Bernstein and Chebyshev polynomials to a general 
f(x). Among the less familiar results is one, of a negative character, due to Woron- 
owskaja. 

If f(x) is a bounded function which has a finite second derivative f” (£) at a point 
£, then 


II. The second part contains material which is likely to be more familiar, 
essentially the L*-theorem of orthogonal systems, in particular polynomials. There 
is an account of the moment problem both in the finite and infinite case. Among 
the less familiar results which are discussed is the following problem of Zolotareff 


and Korkine. Consider, for all polynomials #,(x) of degree n and leading coefficient 
unity, the integral 


| | dx. 


For what polynomial #,(x) is this minimal? The extremal function can be shown, 


by comparatively elementary means, to be the Chebyshev polynomial of the 
second kind: 


O.(2) = sin (n + 1 are cos x) 
sin (are cos 2) 


III. The third part begins with a discussion of Lagrangian interpolation, in 
the case of distinct nodes and in the case of multiple nodes (Hermite). The question 
of the convergence of a sequence of Lagrangian interpolations, given sequences of 
©: a”; a; --- is then considered. Suppose L,(f, x) interpolates f at 
a”, a”, --+ , ax”; what can be said about the convergence of {L,(f, x)} for a 
fixed x? Clearly if f is a polynomial (or an entire function), the sequence is uni- 
formly convergent, no matter how the nodes are distributed. However, given any 
set of nodes {z;‘}, it is possible to construct a continuous function such that 
{L,(f, x)} is not uniformly convergent to f(x). The special case, when f(x) = | x | 
and when the {z;‘”}, are equally spaced in [—1, 1] for each n, is discussed: here 
there is convergence at the points 0, +1, only. 

The remainder of the book is concerned with approximate quadrature and 
includes discussion of the quadratures connected with orthogonal polynomials 
and the convergence of quadrature schemes. Suppose given two triangular matrices: 
one of abscissae 2;"); 2; x; --- (for which a < 2;” < b always holds) and 


one of weights :"; d:”, 2”; ---. Then we consider the sequence of 
quadratures 
Q,(f) = 
and can ask whether 
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It is shown, for instance, that a necessary and sufficient condition that (x) 
should be true of every continuous function, is that it should be true for poly- 
nomials and that | | K,n = 1,2,---. 

The book appears to be written clearly and the printing makes reading pleasant. 
In the early part of the book the demands on the reader are light, only the facts of 
classical real variable theory being assumed. ‘The author has decided to forego com- 
plex variable methods, at the expense of brevity in some proofs and the omission of 
some material, and defends this point of view in the preface. Exposition seems 
excellent—although a rather unmotivated account of the Bernstein proof of the 
Weierstrass theorem is a rather tough beginning. 

There is a great deal to be said in favor of including material of the kind covered 
in this book in regular courses in advanced calculus or real variable theory, and in 
courses on the theoretical aspects of numerical analysis, where the sharpness and 
elegance of many of the results will compensate for some of the unavoidable vague- 
nesses and crudities of practical numerical analysis. This point of view has been 
elaborated by the reviewer elsewhere [3]. ; 

Joun Topp 


California Institute of Technology 
Pasadena, California. 


2. 8. N. BernstTEIn, ‘‘Constructive theory of functions as a development of Chebyshev’s 
nee i Russian. Izv. Akad. Nauk. 9, 1945, 145-158. 
I. AntezEeR, Academian 8. N. Bernstein and his work on the constructive theory 
of uaa in Russian. Isdat. Hav’kov. Gos. Univ., Kharkov, 1955. 
3. JOHN "Topp, Special poly nomials in numerical analysis, Proceedings, Symposium on 
Numerical Approximations, University of Wisconsin, 1958. 


10[X, LABOR DER TECHNISCHEN HocHscHULE WIEN, 
MTW Mitteilungen, vol. 3, 1956, 312 pp., vol. 4, 1957, 420 pp., Price DM 
10. per volume. 


The subtitle of this bi-monthly periodical reads, translated, ‘A journal dedi- 
cated to relations among mathematics, technology and economy.” Its pages are 
divided between contributed articles, progress reports from the Mathematical 
Laboratory of the Technische Hochschule of Vienna, book reviews, contents of 
journals, literature surveys, information items, communications of the Gesellschaft 
fiir Angewandte Mathematik und Mechanik (GAMM), “communications” of 
commercial companies—IBM, Bull, Remington-Rand—and advertising. 

The contributed articles account for about half of the total number of pages. 
Only a few of them are new research contributions; some are reports about past 
meetings or about newly established organizations, and there are also many worth- 
while expository articles. Subjects covered include numerical mathematics, scien- 
tific, engineering and management applications of digital computers, automation 
and cybernetics, analog machines and their applications. Few of the articles are 
specifically slanted toward the most modern large computing machines. Among 
the new contributions of general mathematical interest we mention (all titles 
translated from the German): A. Hirschleber and G. Exner, Approximate evalua- 
tion of Stieltjes integrals by means of selected ordinates; Hj. Kolder and N. Unter- 
steiner, Physical processes during an explosive decompression; H. Scholz, Contribu- 
tions to the Krylov-Bogolyubov approximation method for nonlinear ordinary 
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differential equations; W. Spindelberger, Determination of real roots of algebraic 
equations on small electronic computers. 

The progress reports of the Mathematical Laboratory consist mainly in lists of 
titles of completed tasks, without details about objective or methods. The Labora- 
tory operates an IBM 604 and an analog installation. 

Under the heading “Periodicals service” the tables of contents of about 20 
technical journals are reprinted. The “Literature reports” give references to new 
publications arranged by subject matter. Examples of subject matter fields covered 
are: algebraic equations; structure of matter; automation; eigenvalue problems; 
solid state physics; financial arithmetic; cybernetics; mathematical methods in 
biology; and about twenty others, similarly diversified. 

Franz L. Aur 


National Bureau of Standards 
Washington, D. C. 


11[Z].—R. L. Coserirr, Nonlinear Control Systems, McGraw-Hill Book Company, 
1958, viii + 327 p. Illus. Price $9.00. 


The field of nonlinear control systems has received little treatment in the pub- 
lished textbook literature, a rather unusual situation particularly in view of the 
fact that almost two dozen texts have been written on linear control theory and 
that control systems are in general nonlinear. The fact that control systems can 
often be treated satisfactorily by linear methods does not, of course, make a text- 
book on non-linear control theory less overdue. 

The book consists of eleven chapters arranged as follows. The first three chapters 
are devoted to introductory and linear theory aspects. Chapters four and five are 
devoted to small-signal and piecewise-linearization techniques for the analysis of 
nonlinear systems. The following three chapters are devoted to phase-plane and 
describing-function techniques. It is interesting to note, however, that the author 
steadfastly uses the term “nonlinear gain” rather than the more universally used 
term “describing function”. The discussion on nonlinear gain is marked by a fair 
amount of attention to asymmetrical systems in which a d.c. component. must be 
considered as well as harmonics of the excitation frequency. 

The following chapter is devoted to linear equations with periodic coefficients. 
This chapter is introduced because the frequency response of a nonlinear system 
can often be determined by reducing the nonlinear differential equation to a linear 
equation with periodic coefficients. This chapter ends with a discussion of sampled 
data systems treated as linear systems with time-varying parameters. 

The last two chapters are devoted to the analysis of nonlinear control systems 
with random processes as inputs, and to the use of logic circuits in nonlinear control 
system actuation. 

This book represents an introductory rather than a comprehensive treatment. 
For instance, in the discussion of phase plane-techniques, the Lienard construction 
method, acceleration-plane method, phase-space methods, etc., are not treated. 
Systems with several nonlinearities distributed throughout the system are not 
treated by describing-function methods. The significant works of Feldbaum, Bose, 
Kuba and Kazda, etc., are not mentioned. Also no discussion is presented of the 
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powerful techniques of time-domain synthesis evolved by the Dynamic Analysis 
and Control Laboratory. However, as an introductory treatment which may serve 
as a stepping stone to some of these more powerful techniques, the book can be 
recommended. 

C. T. LeonpEs 


University of California, 
Los Angeles 


12[Z].—L. Lanpon GoopMan, Man and Automation, Penguin Books, Ltd., London, 
1957, 286 pp. illus. Price $0.85. 


An appropriate subtitle might be, “a strictly British view of automation.” 
While Mr. Goodman’s book is an interesting review of some of the problems facing 
British industry today, it is not likely to advance the technical or general knowledge 
of the American reader who possesses even a nodding acquaintance with modern 
industrial technology. 

Mr. Goodman, a highly respected British industrial engineer, begins with the 
accepted definition of “automation” that ties the word to a closed-loop feedback 
control system not involving routine human intervention. He briefly describes the 
development of automatic and automated processes through history, identifies 
areas in which automation could fruitfully be employed, and then proceeds to 
extended discussion of these major premises: 

1. Automation offers the only way in which the British standard of productivity 
can be raised sufficiently to enable Britain to compete in the world market, par- 
ticularly with the U. S., Germany, and Russia. 

2. British trade union policies are largely restricting introduction of automa- 
tion, particularly owing to: a) insistence on full employment at any cost; b) mis- 
understanding of the effects and potential benefits of automation; and c) inflexible 
and outmoded attitudes of crafts unions (for example, in Britain a plumber must 
walk to a household repair job; he may not even ride a bicycle). 

3. British management must awaken to new responsibilities in: a) development 
and research in new products and processes; b) improving industrial relations; and 
c) developing management skills through in-service and university training. 

4. The influx of new capital to British industry must be stepped up considerably 
to provide the tools for automation. He reveals that industrial Britain, once a world 
leader, now has the lowest capital investment per worker, the lowest energy avail- 
able per worker, and the lowest productivity per worker of any of the major indus- 
trial nations. 

As a revelation of the problems Great Britain faces today, the book is most in- 
teresting. With respect to the fetish of Labour for full employment, Mr. Goodman 
quotes a major Labour leader as publicly saying: “Who wants to work? We don’t 
care if you work or not, so long as you are getting full pay either way.” It is hard 
to conceive of an attitude that could be more destructive of national morale. 

Management in Britain is not let off lightly. Mr. Goodman is highly critical of 
the prevalence of reliance on tradition (called “common sense’’) rather than reason 
in making management decisions, and also of management’s failure to recognize 
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the value of education and directed development in preparing for management 
positions. 

Mr. Goodman concludes: “Only the companies which enthusiastically and sin- 
cerely adopt the newest ideas will flourish. The others will fall by the wayside.” 
For Labour’s benefit he adds these words: “The real threat to full employment is 
failure to adopt the benefits of scientific and technological advances.’ 

Ricwarp H. Him. 


University of California, 
Los Angeles 
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TABLE ERRATA 


268. J. B. Russexy, “A Table of Hermite Functions’, Journal of Mathematics 
and Physies, Vol. XII (1933), p. 291-297. 


An examination of pages 292 and 293 of these tables (¢,(2) for n = 0-3, in- 
clusive; x from 0-8, inclusive) revealed 35 errata. Values of e *” were taken or 
developed from U. 8. National Bureau of Standards, Applied Mathematics Series 
No. 14, Tables of the Exponential Function e*, and the supplementary Applied 
Mathematics Series No. 46, Table of the Descending Exponential. Exact 
values of the necessary Hermite polynomials were built up, using the appropriate 
polynomial expression. Products were carried to about 10 figures for comparison 
with Russell’s table. 


Hermite Functions—Russell’s Table 
Partial List of Errata 
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93726 
- 54608 
16448 
143927 
134048 
°44964 
931452 
1022898 


- 93725 
.54607 
16447 
43937 
134047 
844963 
934759 
1022897 


$2(x) 


| 


6.60 
7.00 


1.1169 
.62501 
-32663 
- 21964 


841516 
932057 


1.1168 
. 62500 
32662 
. 21968 


845882 
932056 


214277 
#20732 
*69850 
760908 


No formal examination was made of the balance of Russell’s table (¢,(x), n 
from 4-11, inclusive). However, an apparently large error in the common factor 
e * for x = 2.50 and x = 6.60 was carried along in ¢,(2) for a number of values 
of n. Hence ¢,(2.50) and ¢,(6.60) are suspect for n > 3. For those arguments, a 
quick test of the tabled values for 3 < n < 7 would seem to indicate that this 
supposition is well-founded. 

The argument value 0.53 should be corrected to read 0.52 on page 292. 


C. R. Sexton 
C. A. SExTon 


J. A. SExTon 
Berkeley, California 
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